eee es ee ea a 


ANNALS OF THE NEW YORK ACADEMY OF poe ce 
Volume XLI, Art. 2. pee Lie 168 


Editor 
Ericn MAREN SCHLAIKJER ° 


Associate Editor (Physics and Chemistry) 
THEODORE SHEDLOVSKY 


CRYSTALLINE PROTEIN MOLECULES 


By 


Epwin J. Conn, I. Fanxucuen, J. L. Onciny, H. B. Vickrry, 
j AND B. E. WARREN 


NEW YORK 
PUBLISHED BY THE ACADEMY 
May 23, 1941 


THE NEW YORK ACADEMY OF SCIENCES 


(Lycrum or Narurau History, 1817-1876) } 


OFFICERS, 1941 


President: Roy Wautpo MinER 


Vice-Presidents: Lusuir B. Spock; CHarLes M. Breprr, JR.; JOHN C. 
FLANAGAN; Raupx Linton; Vicror K. LaMrEr 


Corresponding Secretary: C. Sruart GAGER 
Recording Secretary: Duncan A. MacInnus 
Treasurer: Marvin D. THORN 

Librarian: BARNUM BRowN 

Editor: Erica M. ScHLAIKJER 

Elected Councilors: 


1939-1941: Caryt P. Haskins; W. Rerp. Biair 
1940-1942: W. J. V. OsterHour; Doucitas W. JoHNSON 
1941-1943: Jonn Henpiey Barnuart; FranK A. Bracu 


OFFICERS OF SECTIONS 
GEOLOGY AND MINERALOGY 
Chairman: Lrsuin E. Spock. Secretary: EuGENE N. CAMERON | 
BroLoay 
Chairman: Coaritys M. Breper, Jr. Secretary: KaLpH H. Copnny 
Associate Editor: Ricnarp P. Haun 
PsycHoLoGy 
Chairman: Joun C. FLANAGAN. Secretary: DorotHEA McCartuy 
Associate Editor: Frank A. Buacn 
ANTHROPOLOGY 
Chairman: Rauru Linvon. Secretary: KrmBpaLtt YounG 
Associate Editor: Nuts C. NELson 
Puysics AND CHEMISTRY 


Chairman: Vicror K. LaMnur. Secretary: Tanoporr SHEDLOVSKY 
Associate Editor: THnopORE SHEDLOVSKY ~ 


The regular sessions of the Academy are held on Monday evenings 
at 8:15 o’clock from October to May, inclusive, at The American 
Museum of Natural History, Central Park West at 79th Street, New 
York. 


NOTICE 
TEMPORARILY BOUND 


Contains: V.41, Art. 2,4; 
V.43, Art.2,6. 


ANNALS OF THE New York ACADEMY oF ScIENCES 
VotumeE XLI, Art. 2. Pacns 77-168 
May 23, 1941 


CRYSTALLINE PROTEIN MOLECULES* 


By; 


Enwin J. Coun, I. Fanxucuen, J. L. Onciny, H. B. Vickery, 
AND B, E. WARREN 


CONTENTS 


PAGE 


INTRODUCTION TO THE CONFERENCE ON CRYSTALLINE Protein MOoLEcuLEs. 
EES Wee EDS WEN | COHN ae ote tahesvvara hace tin meter aPerseiAL bate aie ore Cian beara chs a aa anata ai 79 


EVIDENCE FROM ORGANIC CHEMISTRY REGARDING THE COMPOSITION OF PROTEIN 
DIOnMOULHS,. Sov EH. Ba ViOKWRY «acacia sist emaen cpunletive saa acven Gl 87 


Eyipence From Puysican CHEMistRY R@GARDING THE SizB AND SHAPE OF 
Prorery Mo.iecutes From ULTRA-CENTRIFUGATION, Drirrusion, VIs- 
cosiry, Dretectric Dispprsion, AND Dovupie Rerractrion or FLow. 

AB Yi s Lin ONCHEE =; cstorem crass sincly an spMeeitisier stele cic oma ean ahersicteyston 121 


Tar X-Ray Dirrraction Mreruops Usep 1n Prorern Srupims. By B. E. 
UVSA TESA BONG ae ceo re Sv ny Sen vires erste aometers SiLrek ol a nUMrche Reese hse ct) Sia: Shs ae ibcatghons, alana sf fe 151 


EviIpENCE FROM X-Rays REGARDING THE STRUCTURE OF PRoTEIN MoLEcuLEs. 
BY De RANK UCHTIN: 5 sini. surrereritemisicrsre, syaich ay atuanenesteres stern aieyavete she aia] Bessy 48 157 


* This series of papers is the result of a conference on physical, physical-chemical and 
organic-chemical evidence regarding crystalline protein molecules held by the Section of 
Physics and Chemistry of the New York Academy of Sciences, February 2 and 3, 1940. 

Publication made possible through a grant from the income of the Ralph Winfred Tower 
Memorial fund, 

Manuscript received by the Editor February 1, 1941. 


(77) 


INTRODUCTION TO THE CONFERENCE ON 
CRYSTALLINE PROTEIN MOLECULES 


BY 


By Epwin J. CoHN 


From the Department of Physical Chemistry, Harvard Medical School, 
Boston, Massachusetts 


These conferences had their origin, I am told, in the notion that 
despite the large number of scientific meetings and symposia there 
remained a need for informal critical discussion of the problems con- 
fronting the active investigators in any given field. During the first 
year of this section, 1938-1939, I attended two of the three conferences. 
The first was on Electrophoresis and had been arranged by Dr. Duncan 
A. MacInnes, the third was on Dielectrics and had been arranged 
by Dr. Charles P. Smyth. In both there was an impressive balance 
between experiment and theory. In both there was incisive comment 
as a result of which more than one new investigation has been under- 
taken in order to test tentative hypotheses put forward as possible 
explanations of the accumulated evidence. 

In both the conference on Electrophoresis and in that on Dielectrics 
the unifying principles were theoretical and technical. The laws of 
electrophoresis were considered in terms of all manner of ions, or of 
particles that are electrically charged. The application of dielectric 
theory was considered both to insulating materials and to biochemical 
systems. For the very nature of fundamental science is that laws de- 
veloped for one kind of material or system have validity that transcends 
the origins of these laws. 

In both the conference on Electrophoresis and the conference on 
Dielectrics reference was repeatedly made and deductions were drawn 
from investigations upon substances which, though diverse, in nature 
and function, form a well defined class of molecules; the proteins. 
The importance of these substances for biology and medicine; for 
agriculture and industry; is such that they have in recent years been 
investigated by many and ever increasingly powerful tools. In the 
course of these conferences it occurred to us, however, that the very 
specialized nature of the tools might well result in the future in the 
training of men who knew the tools more intimately than the sub- 
stances they were investigating. Indeed, each tool begins to have so 
interesting a history and to involve so intricate a theory that few not 


(79) 


80 ANNALS NEW YORK ACADEMY OF SCIENCES 


trained in its use can interpret the important investigations that are 
being reported and critically appraise the evidence that may be 
deduced from them. 

The approaches to protein chemistry have in recent years been with 
the tools of the physicist and the physical chemist as well as with those 
of the organic chemist who yielded us our first notion of the nitrogenous 
nature of the proteins, of their composition in terms of amino acids 
and of the manner in which the amino acids were bound together in 
polypeptide linkage. The development through organic chemistry, 
begun in the nineteenth century, has continued in the twentieth cen- 
tury with the discovery of new amino acids and improvement in the 
methods of hydrolysis and of the isolation of amino acids from the 
protein hydrolysates. These developments have led to increases in 
our knowledge of the amino acid composition of proteins and there- 
fore, in the ratio of the chemical groups of different kinds that are 
free in each protein. 

There would appear to be a growing conviction that differences in 
the properties of proteins, as a result of which some are elements of 
structure in biochemical systems, others enzymes, hormones or viruses, 
may inhere in the distribution and spatial arrangement of these chem- 
ical groups of diverse nature. The notion of distribution involves 
notions of size and shape and the most potent tools yet available for 
the investigations of these properties of proteins have been developed 
by physicists and physical chemists. 

The ultracentrifuge, developed by Svedberg, is by far the most 
powerful tool that we have for the study of the mass of the protein 
molecule. It has confirmed the idea that proteins are among the 
largest molecules known and has largely supplanted the earlier methods 
of estimating molecular weight by osmotic pressure or by ultrafiltra- 
tion as a multiple of an analytically deduced minimal molecular 
weight. The calculation of molecular weight from the sedimentation 
velocity of a molecule in the ultracentrifuge generally involves knowl- 
edge of its free diffusion. Diffusion measurements, if carried out with 
sufficient accuracy, would of themselves yield molecular weights if the 
molecules were spherical. Since most protein molecules appear not 
to be spherical, a combination of diffusion and sedimentation velocities 
has been employed in calculating molecular weights; of molecular 
weights and diffusion constants in calculating asymmetry or of dif- 
fusion and viscosity—a property independent of size for incom- 
pressible, unchayged, spherical molecules—in the calculation of mo- 
lecular weight. 


COHN: INTRODUCTION TO THE CONFERENCE 81 


Protein molecules may be oriented if they are asymmetrical either 
with respect to shape, or to the distribution of their electrically charged 
groups. In the former case they reveal double refraction of flow, in 
the latter they increase the dielectric constant of the solvent. The 
amount of the latter effect, yields, as a vector sum, the dipole distance 
and moment of all the electrically charged groups of the protein. The 
frequencies at which dispersion of the dielectric constant occurs, and 
the shape of the curve, can be employed in the calculation of protein 
asymmetry if the molecular weight is known since the time of relaxa- 
tion of an oriented molecule is greater the greater its asymmetry. The 
relaxation times and the dimensions of proteins can be estimated both 
from measurements of the dielectric constant and of double refraction 
of flow, and compared with those derived from diffusion and viscosity. 

Each of these measurements, when carried out with adequate techni- 
cal skill, yields data of the utmost importance in the characterization 
of proteins; provided these have been so purified that only one kind 
of molecular species is investigated. Some of these techniques them- 
selves yield evidence as to whether or not a given protein preparation 
consists of more than one molecular species. The interpretation of 
results when protein mixtures are investigated is far more difficult, 
however, than when only a single kind of chemical individual is 
studied. For the purposes of the present conference we propose there- 
fore to limit discussion to proteins which have thus far been isolated as 
relatively pure chemical individuals. The criteria of chemical in- 
dividuality are thus also worthy of discussion. Thus, crystallization 
is unquestionably a great aid in purification, but crystallization, even 
five recrystallizations (as in the case of the serum albumins of the 
horse), is not adequate proof that a chemical individual has been 
isolated, nor is electrophoretic mobility, dielectric dispersion, nor 
sedimentation velocity. 

Protein molecules of the same family are often closely related and 
solubility measurements often reveal the existence of mixtures in 
systems even when all of the molecules have the same size, shape, and 
net charge. A protein whose solubility is independent of the amount 
of the solid crystalline phase with which it is in equilibrium may be 
considered a chemical individual. The composition of every prepara- 
tion of such a protein should be the same. The obstacles to deducing 
composition, let alone structure, from analytical measurements upon 
the amino acids of a mixture of proteins are, however, almost in- 
superable. 

And yet sufficiently precise analytical measurements upon those 


82 ANNALS NEW YORK ACADEMY OF SCIENCES 


amino acids present in a pure protein in small amount should be of 
the greatest value in determining the minimal molecular weight and 
thus aiding in the calculation of its true molecular weight. There 
should thus be a symbiotic relation between the organic chemist and 
the physical chemist in their attack upon the structure of proteins. 
In the isolation of a protein as a chemical individual the tools both 
of analytical and physical chemistry are of inestimable value. Once 
the protein is isolated and of proven purity, the organic chemist should 
be able to supply information regarding the nature of the groups that 
are present and something regarding their relations to each other 
which should be as helpful to the physical chemist as are his measure- 
ments to the analytical chemist. 

Three examples of interrelations between the analytical and physical 
chemistry of the proteins may be cited. The determination of the 
iron in hemoglobin early led to an accurate estimation of the minimal 
molecular weight of this protein. Measurements of sulfur and sulfide 
sulfur led T. B. Osborne to consider the molecular weights of many 
proteins in terms of this element, largely present as the amino acids 
methionine and cystine or cysteine. The latter amino acid is present 
in very small amount in many proteins, as are tryptophane and 
tyrosine, and the analyses available at that time led us, some fifteen 
years ago, to attempt a systematic evaluation of the minimal molecular 
weights of the proteins in terms of their amino acid compositions. 
Newer and more accurate analytical methods are now available for 
these and other amino acids and a far more accurate estimate of the 
minimal molecular weights of those proteins, which have been proven 
to be chemical individuals, should thus now be possible. I should 
rather see the analytical results employed in the calculation of minimal 
molecular weights, which they are capable of yielding without further 
assumption, than interpreted in terms of molecular weights determined 
by physical-chemical considerations. The results from the two inde- 
pendent approaches may thus be compared, the errors in each procedure 
be more accurately appraised, and the cumulative evidence lead to an 
overwhelming confidence in our methods. ~ 

A second example of the comparison between the results of analytical 
and physical chemical studies of proteins could be derived from the 
estimates that have been made, on the one hand, of the acid-combining 
capacity of proteins, and on the other, of the dibasic amino acids, 
especially histidine, arginine and lysine. There would appear to be 
little doubt that, with the possible exception of a few terminal a-amino 
groups, it is the e-amino, the imidazole and the guanidine groups of 


COHN: INTRODUCTION TO THE CONFERENCE 83 


proteins that combine with acids. In the sixteen or more years since 
we first became interested in this relation, though the results of the 
physical chemist have been amplified or changed in fewer instances 
than have those of the analytical chemist, the correspondence of the 
evidence from these totally different methods has come ever closer. 

These basic groups as well as the carboxyl, the hydroxyl and sulf- 
hydryl, the methianyl, indole and benzene groups of proteins appear 
to be largely responsible for their chemical reactivity.. The basic and 
carboxyl groups are generally charged in neutral solution. In so far 
as the charged groups lead to the amphoteric properties of the proteins, 
problems arise which are of the greatest interest, but which Dr. Mac- 
Innes and I decided to omit from this discussion in favor of a future 
conference on the net charge of the protein molecule. 

In so far as these same groups give evidence regarding the com- 
position and structure of proteins, however, they must be considered 
here. Even in the present incomplete state of knowledge regarding 

_ the amino acid composition of proteins several relations can be shown, 
among them that between the nature of the individual groups of the 
protein and its density. The results of many investigators, especially 
those of Svedberg and his co-workers and of Adair, demonstrate that 
the densities of all proteins in solution, referred to their dry weight, 
are close to 1.33; that is, their specific volumes are close to 0.75. 
Some years ago, Edsall, McMeekin, and I had occasion to investigate 
the specific volumes of a large number of amino acids and peptides, of 
their derivatives and isomers. Certain additive volumes were deduced 
for the groups that they contained and found to be consistent not only 
with each other, but with the results deduced by Traube, from the study 
of other organic molecules. Expressed as the specific volumes of cer- 
tain of the groups of which proteins are composed, our results yield 
0.420 for COOH, 0.465 for CONH, 0.481 for NHe, and 1.163 for CH». 

Despite the wide variation between these results, the specific volumes 
of amino acid residues vary from 0.59 for aspartic acid, 0.65 for glycine 
and glutamic acid, and 0.68 for histidine to 0.80 for lysine, 0.85 for 
valine and 0.89 for leucine. The specific volumes of all proteins must 
fall within these limits. Those of proteins which yield large amounts 
of glycine and aspartic or glutamic acid on hydrolysis should be 
smallest, those containing large amounts of leucine or lysine largest. 
Even with the incomplete yields of amino acids that have thus far 
been isolated by the organic chemist, a specific volume of 0.74 may be 
calculated for egg albumin, as compared with the observed value of 
0.749, for insulin of 0.74 as compared with 0.749 observed and for 
edestin of 0.73 as compared with 0.744 observed. 


84 ANNALS NEW YORK ACADEMY OF SCIENCES 


Increase in our knowledge of the composition of proteins on the 
one hand, and of their partial specific and molal volumes on the other, 
should bring even closer the results from this and kindred relations 
between organic and physical-chemical evidence. 

During the period of this slow but steady progress in relating the 
properties of proteins to their composition there have been a succession 
of interesting theories of protein structure. The one in fashion when I 
was a student had the hexone bases at the center of the molecule, 
the mono-amino mono-carboxylic acids radiating from them. I still 
remember a beautiful stellate model of a protein that hung in T. B. 
Osborne’s laboratory when I was a student there, It stimulated talk 
and suggested new experiments, especially on the hexone bases, and re- 
sulted in improvements in the analytical procedures for these amino 
acids. 

Then came a period of disruptive protein structure. The amino 
acids were said to be present as diketopiperazines and the molecular 
weights of the proteins—in phenol—to be about 300. But these low 
estimates were due rather to the water from which it is always difficult 
to free proteins, than to the proteins themselves. 

We hold no final view regarding the structure of proteins. We have 
no notion as to whether they are best regarded as peptide chains wound 
in some regular pattern or as hollow cages, The evidence from physical 
chemistry connects their size and shape, and something regarding the 
number and distribution of their charged and reactive groups. We 
know that few proteins are spherical, that they are hydrated and com- 
pressible; and that most are readily denatured, either with increased, 
decreased, or unchanged molecular weight. 

The ultracentrifuge was early recognized as the most potent tool 
with which to study the sizes of proteins, and at an even earlier date 
the X-ray had been suggested as the most potent tool in the study 
of the structure of proteins. I remember arriving in Cambridge, Eng- 
land, in 1920 to study the work and methods of William Hardy, 
Frederick Hopkins, and Joseph Barcroft, and being shown by the latter 
a list of projects that had been drawn up, as I remember it, by the 
Medical Research Council, or some comparable body. On it was a 
suggested study of the X-ray diffraction of hemoglobin. That was 
twenty years ago this spring. There has since been an X-ray study of 
hemoglobin, and this study is, I understand, still continuing under 
Bragg’s direction at the Cavendish Laboratory in Cambridge. 

The first X-ray picture of a protein I ever saw was of edestin and 
had been made at the Kaiser Wilhelm Institute in Dahlem in 1926. 


COHN: INTRODUCTION TO THE CONFERENCE 85 


At this time a great deal of work was being carried out on the X-rays 
of fibers, especially cellulose fibers. Meyer and Mark, who were con- 
cerned with these studies in the laboratories of the I. G. Farbenindustrie 
also examined some protein fibers and had a study made of the di- 
mensions of the glycine molecule by X-ray methods by Hengstenberg 
and Lenel in their laboratories. As far as I know, this was the first 
study of an amino acid but the need to determine the dimensions of the 
amino acids of which proteins are composed was recognized by Bernal, 
who early carried out a survey, and is now being reconsidered by Paul- 
ing, Corey, and their co-workers. 

It is not my purpose in these introductory remarks to impinge upon 
the contributions of those who will come after me and who will have 
far more competence than I to discuss the X-ray work on proteins. 
It is but natural that so much of this work has been done in England, 
and Astbury’s work, especially on the fibrous proteins, and that of 
Bernal, Miss Crowfoot, Fankuchen, and others on protein crystals, 
is beginning to yield a great deal of information regarding a number 
of proteins. In how far the evidence derived from such studies can be 
employed in estimating the number of molecules per unit cell, the 
dimensions of the molecules and important elements in their structure 
will, I hope, emerge from this discussion. 

If the amino acid analysis of the organic chemist yields the minimal 
molecular weight, the X-ray analysis of the physicist yields the 
weight—a maximum weight—of the number of molecules in a unit cell. 
The molecular weight in solution, though it will sometimes equal either 
the minimum or maximum value, will generally be an intermediate 
integer. In terms of the molecular weights alone there is thus the 
possibility of relating the physical, physical-chemical, and organic- 
chemical evidence. 

The X-ray photographs of the same protein when it is wet and after 
it has been dried reveal certain changes probably produced by the 
removal of water from the crystals. Is the water estimated in this way 
related to the water that Sorensen long since determined to be present 
in crystalline egg albumin and to the water of hydration that Adair 
and his co-workers have estimated to be present from a variety of 
methods? What volume of water is subjected to electrostriction by 
the charged groups of the protein? In how far will X-ray studies 
tell us how this water is distributed and how it is held? 

The interpretation of X-ray patterns has become a vastly complex, 
technical subject and we shall hope to hear in this discussion how far 
the structure can be deduced from X-ray studies alone, how far this 


86 ANNALS NEW YORK ACADEMY OF SCIENCES 


information is independent of, how far supplemental to knowledge of 
proteins gained by other physical, physical-chemical, and organic- 
chemical methods. In the end, the X-ray pattern must depend upon 
the arrangements of the same groups that the organic chemist tells us 
are present and any distribution of these groups suggested by physical- 
chemical measurements must also be consistent with deductions from 
X-ray analysis. Indeed, it would appear an inevitable conclusion that 
sooner or later the evidence from these three different approaches to 
protein chemistry must yield a consistent and far more complete 
picture of the protein molecule than is now available. It was our 
thought in arranging for this conference that the time might well be 
at hand to explore the present state of our knowledge; to discover the 
nature of the assumptions and the nature of the evidence on the basis 
of which new hypotheses may be formulated and new investigations 
planned. 


EVIDENCE FROM ORGANIC CHEMISTRY 
REGARDING THE COMPOSITION 
OF PROTEIN MOLECULES 


By H. B. Vickrry 


From the Connecticut Agricultural Experiment Station, New Haven, Connecticut 


The study of proteins has undergone a very interesting sequence of 
changes when considered from the standpoint of the scientific back- 
ground of the workers in the field. The earliest contributors were for 
the most part affiliated with medicine. Beccari (1682-1766), who in 
1747 prepared the first protein of vegetable origin, was both physician 
and natural philosopher. At various stages in his career he taught 
chemistry, physics, mathematics, logic, anatomy, and medicine, and 
made definite contributions in the fields of human nutrition, of weather 
observation and of disease. Fourcroy (1755-1809), to whom we owe 
the terms albumin and gelatin, was a distinguished physician and 
organizer of medical education. Scheele (1742-1786), who carried 
out the first noteworthy chemical investigation of casein, was an 
apothecary. Braconnot (1780-1854), who discovered glycine and 
leucine in 1820, was also at one stage in his career a pharmacist. 
Wollaston (1766-1828), the discoverer of cystine, was a physician and 
physicist as well as chemist. 

From such investigators one would expect shrewd factual observa- 
tion but not analysis nor theory and, even in the next stage, when 
organic chemists assumed the burden of investigation, attempts to 
account for observations were rare. Berzelius supplied nomenclature 
(cystine, glycine); Mulder coined the word protein and advanced 
the first speculation on what these substances were, thereby stimulat- 
ing much research; Liebig discovered tyrosine and taught Ritthausen, 
the first of the great protein chemists. 

The period from 1820 to 1900 was one in which organic chemists 
and those whose interest in physiology converted them into the first 
generation of biochemists, such men as Kiihne, Hofmeister, Kossel, and 
Schulze, advanced our knowledge enormously. They laid a founda- 
tion that made the emergence of a Fischer inevitable, and perhaps 
their greatest contribution was the discovery of the various amino 
acids of which the protein molecule is formed. Under the spur of 
Fischer’s formulation of an acceptable protein theory, as well as of his 
own discovery of no less than three protein amino acids, progress 


(87) 


88 ANNALS NEW YORK ACADEMY OF SCIENCES 


became increasingly rapid. New types of investigation were under- 
taken which have led logically to the development of the present phase 
of protein chemistry in which physical chemists are playing a most 
important role. 

It is my assignment, in the present conference on general protein 
chemistry, to outline some of the major contributions of the organic 
chemist. It seems to me that the most important of these contribu- 
tions has been the concept that proteins are, from the point of view of 
structure, composed of amino acids. With respect to the details of this 
structure we know relatively little, although this is a theme that oc- 
cupies many minds at the present time. Regardless of details of how 
they are combined with each other, however, it is fundamental to 
know, first, what are these amino acids, and, second, what are the 
proportions in which they are found in the protein molecule. It is with 
these two questions that I propose to occupy your time. 


AMINO ACID THEORY OF PROTEIN CONSTITUTION 


We are today so accustomed to the view that amino acids, in some 
form of combination, make up the bulk, at any rate, of protein mole- 
cules that few stop to inquire how this notion first became current. 
It is, of course, a notion implicit in the earliest work, that of Braconnot 
and of Liebig and his student Bopp, but I think it received its earliest 
explicit statement in Ritthausen’s great book on “The Proteins of the 
Cereals, Legumes and Oil-Seeds,” published at Bonn in 1872. The 
third section of this book is entitled ‘The decomposition products of 
gluten proteins, of legumins, and of conglutins”, these being three 
classes of proteins he had recognized and differentiated, and in the 
first paragraph he points out that glutamic and aspartic acids are 
yielded in different proportions by different proteins and accordingly 
may serve “for the characterization of protein substances and as a 
foundation for the recognition of their individuality.” This is a precise 
statement of the purpose of amino acid analysis and a recognition of 
its fundamental significance. Later in this chapter (on p. 222), he 
gives the first table of the amino acid composition of proteins to be 
published. It is very simple—only two amino acids are included— 
but the germ of the idea is there. 

Within a year, and so far as I can tell quite independently, Hlasiwetz 
and Habermann? arrived at a similar conclusion and published the first 
amino acid analysis of casein. They did not give quantitative values 


1Hlasiwetz, H., & Habermann, J. Jour. prakt. Chem, 7: 397. 1873; Ann. 169: 150. 
1873. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 89 


but they maintained that glutamic acid, aspartic acid, leucine, tyrosine, 
and ammonia account for practically the whole of the casein molecule. 

This view was not shared by all. Schulze grasped it, as did Kossel, 
Fischer, and Hofmeister, but, to the great Kiihne and his school, 
amino acids were quite unimportant and Kiihne’s pupil Neumeister, 
in the 1897 edition of his textbook, devotes amazingly little attention 
to them. Tyrosine, leucine, and aspartic acid are regular digestion 
products; glutamic acid can be formed by the action of strong acids; 
glycine, alanine, and serine have some significance in connection with 
silk and that is all. It required Hofmeister, Fischer, and Kossel to 
make the significance of amino acids clear. 

Today we are faced with a new problem, or rather with the modern 
development of this same old problem. In its present-day dress, this 
problem takes the form of the question, which are the amino acids 
that are of significance in protein chemistry? There is universal 
agreement that certain amino acids are of importance, and there is 
universal agreement that certain others are not. But there is a most 
interesting group of these substances about which opinions may and do 
differ widely, and it seems to me that we may well pause to inquire 
what is required by way of evidence before a given substance may 
be, as it were, admitted into this inner circle of the elect. f 


CRITERIA FOR ACCEPTANCE OF AN AMINO ACID 


Some years ago Schmidt and the author? somewhat arbitrarily set 
up certain criteria by which the validity of a claim for the presence 
of any given amino acid among the products of hydrolysis of proteins 
might be tested. Our motives were thoroughly practical and entirely 
selfish. We were considering each authentic amino acid in detail and 
we did not wish to encumber our discussion with a good deal of ma- 
terial that we were convinced was rubbish. On the other hand, there 
were several cases—borderline cases if you will—where the evidence 
appeared almost adequate but that we felt required further study. In 
order to avoid saying with such grace as we could that we did not 
accept the data as published, we decided that, in order to be accepted, 
an amino acid must also have been isolated by some worker other than 
the discoverer and that its constitution must have been established 
by suitable synthetic means. Obviously, too, it must have been de- 
rived from a pure protein by hydrolysis, and must be thoroughly 
characterized by the preparation of salts and other derivatives. It was 
desirable, as further conclusive evidence, that its utilization by the 
animal body should have been demonstrated. 


2 Vickery, H. B., & Schmidt, C, L.A. Chem. Rev. 9:169. 1931. 


90 


ANNALS NEW YORK ACADEMY OF SCIENCES 


TABLE 1 


CLASSIFICATION OF Amino Acips 


A. Amino acids concerning which there is no doubt whatever, arranged in chrono- 
logical order of discovery as products of protein hydrolysis. 


ie 


OMNIA AP oh 


Glycine! 10. Arginine® 
Leucine! 11. Histidine! 

. Tyrosine? 12. Valine” 

. Serine? 13. Proline™ 

. Glutamic acid* 14, Tryptophane® 
Aspartic acid® 15. Hydroxyproline™ 
Phenylalanine® 16. Isoleucine® 
Alanine’ 17. Methionine!* 
Lysine® 18. Threonine!’ 


B. Amino acids that occupy a special position because of their narrow range of 
distribution or for other reasons. 


igh 
20. 


21. 
22. 
23. 
24. 
25, 


Thyroxine!’ (thyroid gland proteins) 

Diiodotyrosine or iodogorgoic acid! (thyroid gland protein and skeleton 
protein of certain marine organisms) 

Dibromotyrosine”® (skeleton of Primnoa lepadifera) 

Norleucine”! (spinal cord protein) 

Cystine” (universally distributed; mode of linkage of sulfur still in debate) 

Cysteine®’ (evidence from nitroprusside reaction of few native proteins) 

Hydroxyglutamic acid* (existence in proteins doubted by some investi- 
gators) ‘ 


C. Amino acids known as plant constituents that may possibly be expected to be 
found in proteins, 


ue 
2. 


3. 


4. 
5. 


Thiolhistidine® (in ergot as betaine ergothioneine; in blood) 

Dihydroxyphenylalanine*, 27 (in bean seedlings; probably widely dis- 
tributed) 

Citrulline?’ (in watermelon tissue; probably of metabolic significance in 
urea formation in animals) 

Canavanine”? (in certain beans) 

Djenkolie acid®* (in Djenkol bean) 


D. Amino acids for which claims have not been substantiated. 


. Amino butyric acid™ 

. Hydroxyvaline™ 

. Hydroxylysine** 

. Norvaline*, 3 

. Diaminoglutaric acid* 

. Diaminoadipic acid® 

. Hydroxyaspartic acid** 

. Dihydroxydiaminosuberic acid* 
. “Caseianic acid’ 

. “Caseinic acid’? 

. Prolysine (e-amino-e-hydantoin caproic acid)*” 
. Hyphasamine (Ci¢H2,OsN2)*8 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 91 


Taste 1—Continued 


CLASSIFICATION oF AMINO ACIDS 


18. Dodecandiaminodicarboxylic acid®® 
19. Base C.Hi10;N*° 

20. Protoctine (CsH1;03Ns)" 

21. Diaminotrihydroxydodecanic acid, # 
22. Hydroxytryptophane, 


1Braconnot, H. Ann. chim. phys. (2) 13: 113. 1820. 

2Bopp, F. Ann, 69:16. 1849. 

3Cramer, E. Jour. prakt. Chem. 96:76. 1865. 

4Ritthausen, H. Jour. prakt. Chem. 99: 454, 1866. 

5 Ritthausen, H. Jour. prakt. Chem. 103: 233. 1868. 

6 Schulze, E., & Barbieri, J. Ber. 14: 1785. 1881. 

7 Weyl, T. Ber. 21: 1407. 1888. 

8Drechsel, E. Jour. prakt. Chem. 39: 425. 1889. 

®Hedin, 8.G. Zeit. physiol. Chem. 20: 186. 1895. 

10Hedin, S.G. Zeit. physiol. Chem. 22: 191. 1896-7. 

1 Kossel, A. Zeit. physiol. Chem, 22: 176. 1896-7. 

2:Fischer, E. Zeit. physiol. Chem. 33: 151. 1901. 

13 Hopkins, F. G., & Cole, S. W. Proc. Roy. Soc. London 68; 21. 1901. 

14 Fischer, KE. Ber. 35: 2660. 1902. 

1s Ehrlich, F. Ber. 37: 1809. 1904. 

16 Mueller, J. H. Proc. Soc. Exp. Biol. & Med. 19: 161. 1922. 

17 Meyer, C. E., & Rose, W. C. Jour. Biol. Chem. 115: 721. 1936. 

18 Kendall, E.C., Trans. Assoc, Am. Physicians, 30: 420. 1915. 

19Drechsel, E. Zeit. Biol. 33:96. 1896. 

20 MGérner, C. T. Zeit. physiol. Chem. 88: 138. 1913. 

2 Abderhalden, E., & Weil, A. Zeit. physiol. Chem. 81: 207. 1912. 84:39, 1913. 

22 Mérner, K.A.H. Zeit. physiol. Chem. 28: 595. 1899. 

23 Mirsky, A. E., & Anson, M.L. Jour. Gen. Physiol. 18: 307. 1935. 

24 Dakin, H. D. Biochem. Jour. 12: 290. 1918. 

25 Eagles, B. A., & Johnson, T.B. Jour. Am. Chem, Soc. 49: 575. 1927. 

26 Torquati, T. Arch. farm. sper. 15: 213. 1913. 

27 Guggenheim, M. Zeit. physiol, Chem. 88: 276. 1913. 

28 Wada, M. Biochem. Zeit. 224: 420. 1930. 

29 Kitagawa, M., & Tomita, T. Proc. Imp. Acad. Japan 5: 380. 1929. 

30 van Veen, A. G., & Hyman, A.J. Rec. trav. chim. 54: 493. 1935. 

31 Abderhalden, E., & Weil, A. Zeit. physiol. Chem. 81: 207. 1912. 

22 Schryver, S. B., & Buston, H. W. Proc. Roy. Soc. London B99: 476, 1925. 

33 Schryver, S. B., Buston, H. W., & Mukherjee, D. H. Proc. Roy. Soc. London B98: 
58. 1925, ' 

31 Abderhalden, E., & Bahn, A. Ber. 63: 914. 1930. 

35 Abderhalden, E., & Reich, F. Zeit. physiol. Chem. 193: 198, 1930. 

36 Skraup, Z.H. Zeit. physiol. Chem. 42: 274, 1904. 

31 Wada, M. Proc. Imp. Acad. Japan 9: 43. 1933. 

38 Engeland, R. Biochem, Jour. 19: 850. 1925. 

39 Frankel, S., & Friedmann, M. Biochem. Zeit. 182: 434. 1927. 

40 Gortner, R. A., & Hoffman, W.F. Jour. Am. Chem. Soc. 47: 580. 1925. 

41 Schryver, S. B., & Buston, H. W. Proc. Roy. Soc. London B1i00: 360. 1926. 

«2 Fischer, E., & Abderhalden, E. Zeit. physiol. Chem. 42: 540. 1904. 

43 Fischer, E. Zeit. physiol. Chem. 99:54. 1917. 

«4 Abderhalden, E., & Kempe, M. Zeit. physiol. Chem. 52: 207. 1907. 

«s Abderhalden, E., & Sickel, H. Zeit. physiol. Chem. 144: 80. 1925. 


92 ANNALS NEW YORK ACADEMY OF SCIENCES 


By the use of these criteria, we eliminated from discussion a series 
of substances for which claims have been advanced from time to time. 
Some of these occupy an altered position today. 

In Taste 1, the amino acids are classified in four groups; in the 
first are eighteen substances about which there is no question whatever. 
These, at least, have been positively identified among the products of 
hydrolysis of proteins and, moreover, most of them may be expected 
to be present in larger or smaller amount in the hydrolysate of any 
given protein. 

A second group has been formed of amino acids which are either 
rarely encountered because of a narrow range of known distribution, 
or because certain details of their mode of combination in the protein 
molecule are still in debate. 

A third group includes amino acids known in nature as plant con- 
stituents and therefore of significance in amino acid metabolism. None 
of these substances has been as yet recognized as a product of protein 
hydrolysis. 

In a fourth group are placed substances, or preparations, which 
figure in the literature as protein amino acids but concerning which 
more or less doubt is still entertained. 

In the following paragraphs, a number of these amino acids are dis- 
cussed in some detail in order to show what may be regarded as the 
present position of our knowledge of them. 


Amino Butyric Acid 
Claimed by Schutzenberger and Bourgeois? in 1875, by Foreman* 
in 1913, by Abderhalden and Weil® in 1912, and by Abderhalden and 
Bahn® in 1937, but insufficiently characterized. 


Hydroxyamino Butyric Acid 

Claimed by Schryver and Buston in 1925, by Gortner and Hoffman‘ 
in 1925, and Rimington® in 1927. None of these preparations was 
completely characterized, and the position of the hydroxyl group was 
not established. In 1935, Rose and his collaborators® demonstrated 
conclusively that this substance is present in several proteins, notably 
casein, and have since established its constitution by synthesis and 

2 Schutzenberger, P., & Bourgeois, A. Compt. rend. 81: 1191. 1875. 

‘Foreman, F. W. Biochem. Zeit. 56:1. 1913. 

6 Abderhalden, E., & Weil, A. Zeit. physiol. Chem. 81: 207. 1912. 

‘ Abderhalden, E., & Bahn, A. Zeit. physiol. Chem. 245: 246. 1937. 

7 Gortner, R. A., & Hoffman, W. F. Jour. Am. Chem. Soc. 47: 580. 1925. 


* Rimington, C. Biochem. Jour. 21: 1187. 1927. 
* Meyer, C. E., & Rose, W.C. Jour. Biol. Chem. 115: 721. 1936. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 93 


named it threonine. It is now regarded as thoroughly established 
as a protein decomposition product. 


Hydroxyvaline 
Claimed by Schryver and Buston in 1925, by Czarnetzky and 
Schmidt?? in 1931 and by Brazier™ in 1930. Abderhalden and Heyns!” 
discussed this substance in 1934 and expressed serious doubt of its 
existence in proteins. 
Hydroxylysine 
Claimed by Schryver and his associates in 1925 and 1927, and by 
Van Slyke and his associates in 1938 as a constituent of gelatin, 
but not yet sufficiently characterized and its constitution established. 


Norvaline 


Claimed by Abderhalden and Bahn in 1930 and also by Abderhalden 
and Reich the same year. In 1932, Abderhalden and Heyns** ad- 
vanced further evidence for their claim. Doubt of the existence of 
this substance among the products of hydrolysis of proteins may still 
quite properly be entertained since it has been secured in only one 
laboratory; nevertheless Abderhalden’s evidence is quite extensive. 


Norleucine 


Claimed by Thudichum in 1901, and by Abderhalden and Weil in 
1912 15 as a constituent of spinal cord protein. More recently it has 
been claimed by Yaginuma, Arai, and Hayakawa!* also in spinal cord 
protein, by Nuccorini!? in castor bean protein, and by Abderhalden 
and Heyns.7® Czarnetzky and Schmidt!® have confirmed much of 
Abderhalden’s early work and advance evidence very little if at all 
short of being completely conclusive that this substance is present 
among the products of hydrolysis of spinal cord protein. The dif- 
ficulty is to distinguish between the physical properties of leucine, 
isoleucine, and norleucine. Small differences only are to be expected 
and separations are accordingly unusually difficult. However, many 
if not most students are prepared to accept the evidence today. 

10 Czarnetzky, E. J., & Schmidt, C. L.A. Jour. Biol. Chem, 92: 453. 1931. 

u Brazier, M. A.B. Biochem. Jour. 24: 1188. 1930. 

12 Abderhalden, E., & Heyns, K. Zeit. physiol. Chem. 229: 236. 1934. 

13 Van Slyke, D. D., Hiller, A., Dillon, R. T., & MacFadyen, D. Proc. Soc. Exp. Biol. 
Med. 38: 548. 1938. 

14 Abderhalden, E., & Heyns,K. Zeit. physiol. Chem. 206: 137. 1932. 209: 27. 1932. 

15 Abderhalden, E., & Weil, A. Zeit. physiol. Chem. 84: 39. 1913. 

16 Yaginuma, T., Arai, G., & Hayakawa, K. Proc. Imp. Acad. Tokyo 8:91. 1932. 

17 Nuccorini, R. Ann. chim. appl. 24: 25. 1934. 


18 Abderhalden, E., & Heyns, K. Zeit. physiol. Chem. 214: 262. 1933. 
19 Czarnetzky, BE. J., & Schmidt, C.L. A. Jour. Biol. Chem. 97: 333. 1932. 


94 ANNALS NEW YORK ACADEMY OF SCIENCES 


3, 5-Dibromotyrosine 


This was isolated by Morner?° from the skeleton of a coral unusually 
high in bromine. A survey of a number of marine organisms showed 
that bromine is widely distributed, although usually present in only 
small amounts, and the richest species was investigated. The validity 
of Mérner’s observation has never been called in question although, 
so far as I know, it has not been repeated. Dibromotyrosine may 
probably be regarded as a well-established amino acid of very limited 
known distribution and thus occupies a position like that of norleucine. 
The iodine analogue is probably widely distributed in marine organisms 
and in mammalian thyroid tissue. 


Dihydroxyphenylalanine 


This substance is known in nature in the seed pods of certain legumes 
and is also recognized as a product of enzymatic oxidation of tyrosine.*4 
It occupies an important position in theories of melanin formation in 
plant tissues and animals. So far it has not been observed among the 
products of hydrolysis of proteins but is by no means an improbable 
component. 

Thiolhistidine 

Represented in nature by the betaine ergothioneine found in ergot 
by Tanret, and is now known to be a constituent of the blood of 
animals fed a corn diet.22 The amino acid has been sought for in 
zein, but, save that hydrolysates of this protein give what is held 
by Hunter to be a specific color test for the thiolimidazole ring, with- 
out success. 


Canavanine 


Isolated by Kittagawa and Tomita?’ in 1929 from extracts of the jack 
bean, it was found to be a basic substance of the formula C;H,,N,03 
and was later shown”! to be derived from y-hydroxy-a-aminobutyric 
acid. Its constitution was established by Gulland and Morris®® as a 
hydroxyguanidino derivative of this acid, thus confirming the structure 
assigned by the Japanese workers. Its significance in protein chem- 
istry has yet to be established. 


20 Moérner, C. T. Zeit. physiol. Chem. 51:33. 1907. 88:138. 1913. 

2 Evans, W.C., & Raper, H.S. Biochem. Jour. 31: 2155. 1937. 

22 Bagles, B. A., & Vars, H. M. Jour. Biol. Chem. 80: 615. 1928. 

23 Kitagawa, M., & Tomita, T. Proc. Imp. Acad. Tokyo 5: 380. 1929. 

24 Kitagawa, M., & Mononobe, 8. Jour. Agr. Chem. Soc. Japan 9: 845. 1933. 
25 Gulland, J. M., & Morris, C.J.O.R. Jour. Chem. Soc. 763. 1935. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 95 


Citrulline 


In 1914, Koga and Odake?® described the isolation of a substance 
CeHisN303 from the juice of watermelon. Aside from the fact that it 
formed a copper salt, little else was recorded. In 1930, Wada prepared 
the substance again and showed that its properties were best ex- 
plained on the assumption that it is 8-carbamido ornithine. This was 
confirmed by synthesis.*7 Shortly afterwards Ackermann2® repeated 
Wada’s work and obtained citrulline not only from watermelon ex- 
tract, but also from the products of the action of putrefactive organisms 
on arginine. Horn”? likewise observed this last transformation al- 
though the yield was low. 

Wada has also claimed®° the isolation of citrulline from a tryptic 
digest of casein but the possibility of enzymatic production from 
arginine was not excluded. 

Fearon*! has recently described a color test which permits dis- 
crimination between citrulline and all of the well-known protein amino 
acids. This test is given by many proteins (egg albumin, casein, 
fibrin, etc.) and is regarded as evidence that citrulline exists preformed 
in the protein molecule. Incontrovertible evidence from isolation has 
not yet, however, been secured. 


Djenkolic Acid 


Prepared in 1935 by van Veen and Hyman®? from extracts of the 
tropical Djenkol bean, it was held to be the cysteine thioacetal of 
formaldehyde, a substance which occupies an interesting relationship 
to both cystine and methionine. This structure was confirmed by 
du Vigneaud and Patterson** by synthesis and demonstration of identity 
with van Veen’s own preparation. The possibility that djenkolic acid 
may be found among the products of hydrolysis of proteins has not as 
yet been excluded, but the evidence available suggests that it can be 
expected only in rare cases. 

Ornithine 

Reference should also be made to ornithine, the next lower homo- 

logue of lysine. The §-guanidino derivative of this substance is the 


26 Koga, Y., & Odake, S. Jour. Tokyo Chem. Soc. 35: 519. 1914. 

27 Wada, M. Proc. Imp, Acad. Tokyo 6:15, 1930. Biochem. Zeit. 224: 420. 1930 
28 Ackermann, D. Zeit. physiol. Chem. 203: 66. 1931. 

29Horn, F. Zeit. physiol. Chem, 216: 244, 1933. 

30 Wada, M. Proc. Imp. Acad. Tokyo 8: 367. 1932. 

31 Fearon, W.R. Biochem. Jour. 33: 902. 1939. 

32 van Veen, A.G., & Hyman, A. J. Rec. trav. chim. 54: 493. 1935. 

33 du Vigneaud, V., & Patterson, W.I. Jour. Biol. Chem. 114: 533. 1936. 


96 ANNALS NEW YORK ACADEMY OF SCIENCES 


well-known basic amino acid arginine, but so far as I am aware, 
ornithine has never been isolated from a protein except after treatment 
with arginase or with strong alkali. However, its isolation presents 
very special difficulties so that this is not in itself surprising. 
Although not as yet to be classified as a protein amino acid, orni- 
thine occupies an important position in certain theories of amino acid 
metabolism.24 The existence of an enzyme that decomposes arginine 
into urea and ornithine®® is most significant for animal biochemistry, 
and the isolation of §-N-acetyl ornithine from the roots of Corydalis 
ochotensis, a Siberian species of fumitory, by Manske*® suggests that 
ornithine may play a role also in certain plants. The occurrence of 
ornithurie acid (dibenzoyl ornithine) in the excreta of birds to which 
benzoic acid has been administered has, of course, long been known. 


Selenium Compound 


It is perhaps of interest to mention a compound of selenium re- 
cently isolated by Horn and Jones®’ from plants grown in a 
selenium-containing soil. No information is yet available with regard 
to the source or the method of preparation, but the compound is held 
to be an amino acid and the structure provisionally assigned contains 
a cysteine and a homocysteine radical both attached to a single 
selenium atom. Whatever its structure, it is a natural amino acid of 
a new kind that will undoubtedly prove to be of unusual importance. 


Other Compounds 


In addition to these substances there is a long list of claims, some 
of which have been shown to be ill-founded, others of which still stand 
in the literature. Such substances as diaminoglutaric acid, diamino- 
adipic acid, hydroxyaminosuccinic (hydroxyaspartic) acid, di- 
hydroxydiaminosuberic acid, claimed by Skraup in 1904, together 
with two additional vague substances called “caseianic” acid and 
“caseinic” acid, probably represent little more than errors of judgment. 
Fischer contributed diaminotrihydroxydodecanic acid in 1904, but 
withdrew it in 1917. Hydroxytryptophane was claimed by Abder- 
halden and Kempe in 1907, but was later found to have been a mixture 
of a peptide of tyrosine and proline with a little tryptophane. 

Still other unconfirmed reports are those of Wada** who claims to 

3 Krebs, H. A., & Henseleit, K. Zeit. physiol. Chem. 210: 33. 1932. 

45 Kossel, A., & Dakin, H.D. Zeit. physiol. Chem, 41: 321. 1904. 42: 181. 1904, 

16 Manske, R. H.T. Canadian Jour. Res. B15: 84. 1937. 


37 Horn, M. J., & Jones, D.B. Jour. Am. Chem. Soc. 62: 234. 1940. 
38 Wada, M. Proc. Imp. Acad. Tokyo 9: 43. 1933. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 97 


have. prepared a-amino-e-hydantoin capronic acid which he named 
prolysine, from gelatine. Wada®® has also claimed the isolation of 
citrulline from proteins. Engeland®® in 1925 obtained a material from 
elastin of the composition CigsH2,0;Ne that he called hyphasamine 
and regarded as a peptide of tyrosine with a hitherto unknown amino 
acid. 

Frankel and Friedmann‘? in 1927 believed they had obtained a 
dodecandiaminodicarboxylic acid from casein, and two years later 
Frankel and Monasterio*! claimed still another new substance that 
they had obtained from hemoglobin. Gortner and Hoffman’ in 1925 
prepared from teozein a basic substance to which the formula 
C4Hi,03N was ascribed. Schryver and Buston*? the following year 
obtained a substance CsH;;03N3 from castor bean protein that they 
named “protoctine”. 

Even this list is probably incomplete, but sufficient has been included 
to illustrate the point I wish to make, namely that the isolation of a 
new amino acid is a matter that calls for a degree of self-criticism, 
in addition to skill, patience, and chemical insight, that is extremely 
rare. I sincerely believe that announcements of new amino acids must 
be treated with the utmost conservatism. Doubtless it is difficult if 
not impossible to set up rigid criteria for acceptance, but the very 
fact that about as many substances have been claimed as new pro- 
tein amino acids in the past thirty-five years as have been satisfactorily 
established in the past one hundred and twenty shows the danger to 
which students are exposed. I know of very few textbooks that con- 
tain a list of protein amino acids that is above criticism. Even 
Schmidt’s recent handbook omits threonine from the list of accepted 
amino acids and includes it in the group of those reported but not 
verified, although norleucine appears in the accepted list. To my mind 
the evidence for threonine so far outweighs that for norleucine as to 
leave no room for argument. 

Two serious problems remain to be discussed in this connection: 
these are the position of cystine and cysteine, and the position of 
hydroxyglutamic acid. 

Cystine is a disulfide and, according to the views of protein consti- 
tution advanced by Astbury, and by Mirsky and Pauling, represents a 
configuration within the molecule whereby two peptide chains, or 
possibly two folds of the same chain, are linked together through the 

39 Engeland, R. Biochem. Jour. 19: 850. 1925. 

40 Frankel, S., & Friedmann, M. Biochem. Zeit. 182: 434. 1927. 


41 Frankel, S., & Monasterio, G. Biochem. Zeit. 213: 65. 1929. 
« Schryver, S. B., & Buston, H. W. Proc. Roy. Soc. London B100: 360 1926, 


98 ANNALS NEW YORK ACADEMY OF SCIENCES 


sulfur. I do not know just how cystine is dealt with in the formula- 
tions of Wrinch, but assume that it may also serve as some form of 
linkage between adjacent fabrics. On any view, what we isolate as 
cystine is really present in the protein molecule fundamentally as 
cystein and, regardless of any assumptions of protein structure, there 
is considerable evidence today that cysteine does occur in certain 
proteins. A few proteins give a nitroprusside reaction (myosin) 
while in the native condition; many if not most proteins give this 
test after denaturation.4® We have no theory to account for the 
presence of sulfhydryl groups in native proteins unless it be assumed 
that cysteine forms one of the amino acids in the structure. The pres- 
ence of increased amounts of sulfhydryl after denaturation, or the 
appearance of these groups through the agency of denaturation** would 
appear to suggest that a rearrangement of the cysteine side chains oc- 
curs whereby free sulfhydryl groups are liberated. It is usually as- 
sumed that much of the sulfhydryl is in combination in pairs in the 
' form of cystine, these linkages being broken by the rearrangement 
that occurs when the protein is denatured. Thus it seems probable 
that both cysteine and cystine should be regarded as being present in 
the original protein molecule, at least in many cases. Certainly the 
study of the relationships of these two substances in the molecule is 
one of the most promising lines of attack upon the fundamental 
problem of structure. 

The other problem—the position of hydroxyglutamic acid—presents 
several puzzling features. In 1918, Dakin*® encountered this substance 
as a product of hydrolysis of casein. He described a method of isola- 
tion, difficult to be sure, by which he obtained on one occasion a yield 
of approximately 10 per cent; usually however only from 2 to 3 per 
cent could be isolated. The substance was characterized by analysis, 
by means of the salts of metals, by its conversion into the correspond- 
ing pyrrolidone carboxylic acid and by the product of oxidation, malic 
semialdehyde. Somewhat later,*® he described its preparation from 
gliadin and glutenin and also gave a method for its synthesis. 

There have been several reports in the literature of subsequent iso- 
lations of this substance although none of these is entirely convincing. 
Onuki*’ obtained “shapeless crystals” which gave the tests described 
by Dakin and also a similar product of oxidation. Calvery*® gave an 


43 Greenstein, J. P. Jour.}Biol. Chem. 125: 501. 1938. 128: 233. 1939. 
44 Mirsky, A. E., & Pauling, L. Proc. Nat. Acad. 22: 439. 1936. 

4s Dakin, H.D. Biochem. Jour. 12: 290, 1918. 

4° Dakin, H. D. Biochem. Jour. 13: 398. 1919. 

47 Onuki, M. Jour. Chem. Soc. Japan 43: 737. 1922. 

4sCalvery, H.O. Jour. Biol. Chem. 94: 613. 1931. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 99 


analytical figure for egg albumin based upon an isolation experiment 
performed on the hydrolysate from 250 gm. of protein. The identifica- 
tion depended upon the neutralization equivalent, the silver content 
of the silver salt, and the nitrogen content of the strychnine salt. His 
preparation was not crystalline but was obtained as a white powder by 
the use of alcohol. Gulland and Morris*? obtained a small quantity 
from the barium sulfate precipitated during the removal of the acid 
from a hydrolysate of casein. The yield was very small, 0.33 per 
cent of the casein taken, but the substance was characterized as the 
nitrophenylhydrazone of malic semialdehyde. 

Harington and Randall®® have described a synthesis of B-hydroxy- 
glutamic acid which makes the substance reasonably easily available. 
A careful comparison of the properties of the synthetic acid with those 
given by Dakin for the natural optically active substance showed 
many similarities and a few differences. 


Synthetic (Harington) 


Crystallizes from ‘concentrated 
aqueous solution, usually in stout 
prisms with water of crystallization, 
N = 8.5, 8.6%. 


Oxidation with chloramine-T gave 
malic semialdehyde isolated as 
nitrophenylosazone and crystallized 
from nitrobenzene ‘‘in brownish 
red prismatic needles m.p. 291° 
(uncorr.).”’ 


Silver salt insoluble, N = 3.6, 
theory 3.7%. 


Natural, (Dakin) 


Crystallizes “slowly in thick prisms 
from its sirupy solution,’ N = 


’ 8.40, theory 8.59%. No mention of 


water of crystallization. 


Oxidation with chloramine-T gave 
malic semialdehyde isolated as 
nitrophenylosazone and crystallized 
from nitrobenzene ‘in red-brown 
prismatic needles melting at 297— 
299°.” 


Silver salt insoluble, N = 3.56, 
Ag 57.8, theory 3.7 and 57.3%. 


Certain differences were found by Harington and Randall; many 
of these, however, may well be due to stereoisomerism since four active 
and two inactive forms are theoretically possible. The synthetic acid 
crystallized from concentrated solution with three molecules of water 
of crystallization and was only moderately insoluble in cold water. 
When heated, the hydrated acid liquefied at 75°, solidified again and 
finally decomposed at 185°. The anhydrous acid decomposed at 
198°. Dakin’s product had no sharp melting point, and appeared to 
lose water slowly at 105° with ring closure. No such ring closure was 


49 Gulland, J. M. & Morris, C.J.O.R. Jour. Chem. Soc, 1644. 1934. 
so Harington, C. R., & Randall, $.8. Biochem, Jour. 25: 1917. 1931, 


100 ANNALS NEW YORK ACADEMY OF SCIENCES 


observed with the synthetic acid even when heated at 110° over 
phosphorus pentoxide although this did occur when the aqueous solu- 
tion was boiled at pH 4. Harington and Randall found that the 
hydrochloride of the synthetic acid has convenient properties for iso- 
lation and pointed out that its insolubility may account for some of the 
failures to isolate this substance that have been experienced. On the 
other hand, there is no evidence that the hydrochloride of the natural 
substance is notably insoluble. 

Harington and Randall record complete failure of their attempts to 
isolate hydroxyglutamic acid from casein even when Dakin’s direc- 
tions were “meticulously followed.” 'They state, however, that the 
fraction that should have contained this substance, although small and 
heavily contaminated with glutamic and aspartic acid, did in fact yield 
the oxidation product described by Dakin in “not inconsiderable 
amounts” and they did not hold the existence of hydroxyglutamic 
acid to be in doubt. 

Nevertheless the repeated failures to secure evidence for the pres- 
ence of hydroxyglutamic acid in protein hydrolysates that have been 
experienced have led to a feeling of uncertainty regarding this sub- 
stance. Although, to my knowledge, no publications have appeared 
in which its occurrence is denied, it would seem desirable that this 
problem should be discussed. ; 

I have one small contribution to make to this discussion. Several 
years ago Dr. James Melville, working in my laboratory, undertook 
to develop a new method to estimate hydroxyglutamic acid. It seemed 
possible that, if this substance were deaminized and then oxidized with 
permanganate and bromine under the same conditions as those em- 
ployed for the oxidation of malic acid in the course of the analytical 
method for this substance,® a dinitrophenylosazone of the brominated 
oxidation product should be obtainable analogous to, if not identical 
with, the product from malic acid. This was found to be the case and 
a simple procedure to determine hydroxyglutamic acid (synthetic) 
was readily developed. Glutamic acid itself does not give this oxida- 
tion product but aspartic acid of course does. Accordingly it became 
necessary to develop a trustworthy procedure to separate glutamic and 
hydroxyglutamic acid from aspartic acid before the analytical method 
could be applied to protein analysis. The simplest way to do this 
seemed to be to convert the hydroxyglutamic and glutamic acids to 
their respective pyrrolidone carboxylic acids by ring closure; these 


 Pucher, G. W., Vickery, H. B., & Wakeman, A. J. Ind. Eng. Chem.,, Anal. Ed. 6: 
288, 1934. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 101 


could then be extracted at a suitable reaction (about pH 3) by means 
of ethyl acetate in a continuous extraction apparatus of the Widmark 
type.°? The extract would thus be freed from any contamination with 
aspartic acid, and the pyrrolidone carboxylic acids could then be 
hydrolyzed back to the straight chain compounds and the mixture 
analyzed by the oxidation procedure. Preliminary experiments only 
were completed before Dr. Melville had to return to New Zealand, but 
the observations he made were most encouraging from one point of 
view and most disturbing from another. 

The conditions under which about 90 per cent of the acids could 
be converted to the respective ring compounds were found. It was 
sufficient to heat at boiling temperature in dilute solution for 24 hours. 
With protein hydrolysates, a shorter period of heating in the autoclave 
at 120-125° was adopted, but difficulty was experienced in obtaining 
so complete a ring formation; three successive heatings and extrac- 
tions were necessary. The most serious trouble, however, was with 
the instability of the hydroxypyrrolidone carboxylic acid when at- 
tempts were made to hydrolyze it back to the straight chain compound. 
Significant losses were experienced even when 2N hydrochloric acid 
at boiling temperature for 1.5 hours was used; with 20 per cent acid 
the loss was serious. 

This brought up the question whether hydroxyglutamic acid itself 
is stable when boiled with strong acid. When 1 gram of casein and 
0.1 gram of hydroxyglutamic acid were boiled 26 hours with 30 per 
cent sulfuric acid, the pyrrolidone carboxylic acid recovered after the 
subsequent ring formation and extraction operations was only about 
10 per cent of itself higher than that from a casein blank. The 
bromine-permanganate oxidation test was positive and its intensity 
corresponded to this small increase in pyrrolidone carboxylic acid as 
determined by the increase in amino nitrogen on hydrolysis. The 
inference was clear, however, that little hydroxyglutamic acid survived 
the treatment with boiling acid necessary for hydrolysis of the pro- 
tein, and it was also significant that no oxidation test could be secured 
from hydrolysates of casein prepared in the usual way and subjected 
to the procedures for pyrrolidone carboxylic acid formation and ex- 
traction, This recalls the observation of Knoop and his associates®? 


® Pucher, G. W., & Vickery, H.B. Ind. Eng. Chem., Anal, Ed. 11: 656. 1939. 12: 
27. 1940. 

53 Knoop, F., Ditt, F., Hecksteden, W., Maier, J., Merz, W., & Harle,R. Zeit. physiol. 
Chem, 239: 30. 1936. 


102 ANNALS NEW YORK ACADEMY OF SCIENCES 


who found that synthetic hydroxyglutamic acid is in part decomposed 
to ammonia and a-ketoglutaric acid by boiling acid.* 

These observations provide what is possibly a clue to the present 
situation with respect to hydroxyglutamic acid. Examination of 
Dakin’s original paper shows that he hydrolyzed his casein for only 12 
to 16 hours with 5 times its weight of 25 per cent by volume sulfuric 
acid, 2. e. 35 per cent by weight. Is it possible that his success was due 
to a far less severe treatment of the protein than is now customary? 
Until these matters have been settled, I feel that one must go very 
carefully indeed before declining to accept Dakin’s work. 

One further point with regard to the amino acids that have been 
shown to be present in the protein molecule should perhaps be made. 
Aspartic and glutamic acids are indeed products of protein hydrolysis, 
and so fulfill the definitions that have been discussed above; but, in 
the protein molecule itself, these substances occur to a large extent 
as their respective amides asparagine and glutamine. The literature 
of amide nitrogen in proteins is long and interesting;** the suggestion 
that asparagine may be present in the intact protein molecule was made 
by Ritthausen in his book in 1872, and probably independently by 
Nasse the same year. The view was supported by the investigations 
of Hlasiwetz and Habermann which appeared the following year, and, 
in fact, the name glutamine was coined by these authors at that 
time, long before glutamine itself was recognized to be present in plant 
tissues by Schulze and ultimately isolated. In recent years, this early 
speculation has been confirmed by the isolation of asparagine from 
enzyme digests of edestin by Damodaran® and of glutamine from 
gliadin by Damodaran, Jaaback, and Chibnall.5¢ Consideration of 
the quantitative relationships between the dicarboxylic amino acids 
and the amide nitrogen shows, however, that, in those proteins for 
which adequate data exist, only a part of the total amount of di- 
carboxylic acid can be present in amide combination. The exact dis- 
tribution of the amide nitrogen between the aspartic and glutamic acid 
in any given case is a problem for the future. 


DETERMINATION OF AMINO ACIDS 


Sufficient has perhaps been said to indicate that no final answer can 
yet be given to the question, which are the protein amino acids? And, 


* Dr. Dakin has informed me that he has been unable to convert natural hydroxyglutamic 
acid into «-ketoglutaric acid by this procedure. 

54 Vickery, H. B., & Osborne, T, B. Physiol. Rev. 8: 393, 1928. 

5s Damodaran, M. Biochem. Jour. 26: 235. 1932. 

5s Damodaran, M., Jaaback, G., & Chibnall, A.C. Biochem. Jour. 26: 1704. 1932. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 103 


as I shall now show, a still less precise answer can be given to the 
question, how can the proportions of the different amino acids yielded 
by proteins best be determined? At the outset, the purpose of protein 
analysis must be defined. If one is merely interested in whether a 
series of protein fractions, secured by some systematic method of 
precipitation, differ from each other in chemical composition, there 
are literally scores of methods that may be employed. To show that 
the first and last members of the series differ in, for example, trypto- 
phane content, a notably inaccurate method is almost as good as a 
thoroughly accurate one, provided that it is applied under rigidly con- 
trolled conditions. However, if the purpose is to obtain the true 
tryptophane content so that calculation of molecular relations with 
other amino acids may be carried out, the choice of methods is neces- 
sarily more circumscribed. This is the type of analytical operation 
that is envisaged in the following discussion since I feel that we are 
now fundamentally interested in absolute values rather than in com- 
parative values. 

For the purposes of discussion therefore, the well-known amino acids 
are divided into three groups, and I shall try to indicate the position 
of our analytical knowledge with respect to most of them. 

It will be convenient to begin in reverse order and specify the amino 
acids for which there are no adequate analytical methods whatever. 
The substances in Group A of TABLE 2 are the amino acids for which 
our information is largely, if not entirely, qualitative. To be sure 
more or less of each of them has been isolated from various proteins 
and to this extent the information may be thrown into a semi-quantita- 
tive form of statement. But the precision of the information is very 
low indeed—we may not know the actual quantities present in the 
protein molecule within several hundred per cent in some cases. 

I realize that this classification is one in which personal opinion 
plays a large part and it is given entirely with the object of eliciting 
discussion. There will probably be little debate about the hydroxy- 
acids, serine, threonine,* and hydroxyglutamic acid. Their presence 
(or absence) has been demonstrated in relatively few cases. It is pos- 
sible that serine may become better known through application of 
Rapoport’s method*? to convert it to glyceric acid which gives a color 
reaction with naphthoresorcinol, but this method awaits detailed 
study and full application. 

* Shinn and Nicolet (Jour. Biol.?Chem, 138: 91.%°1941)"have recently described a method 
for the determination of threonine by means of oxidation with periodate which appears to 


be satisfactory, and mention that a method for serine is being developed. 
87 Rapoport, 8. Biochem. Zeit. 281: 30. 1935. 289: 406. 1937. 


104 ANNALS NEW YORK ACADEMY OF SCIENCES 


TABLE 2 


Amino Acrps CrassiFrep AccoRDING TO DrGREE oF AccurRAcY WITH 
Wuice Tury Can Br DrTERMINED 


A. Amino acids concerning which our information is little better than qualitative. 


Serine Isoleucine 
Threonine Norleucine 
Hydroxyglutamic acid Thyroxine 
Valine Diiodotyrosine 


B. Amino acids for which methods of a considerable degree of probable accuracy 
have been proposed. These methods have been applied to very few proteins 
as yet. 


Glycine Proline 
Alanine Hydroxyproline 
Leucine Phenylalanine 


C. Amino acids for which existing methods appear to give satisfactory results and 
which have been widely applied. 


Cystine Glutamic acid 
Tyrosine Arginine 
Tryptophane Histidine 
Methionine Lysine 


Aspartic acid 


Isoleucine has seldom been demonstrated in protein hydrolysates, 
the custom having been to weigh the amino acids of the correct nitro- 
gen or carbon content and report them as “leucines.” With respect 
to norleucine, we have only a few statements, all of them merely 
qualitative. 

Valine has been prepared in almost every case by the ester distilla- 
tion method and subsequent fractional crystallization of the free acids 
and, until recently, this was for the most part true also for the 
leucines. This group of substances is separated from phenylalanine 
during distillation of the esters—this being in fact the chief purpose 
of the esterification procedure—and the separation of individual acids 
depends on the skill of the investigator in the use of fractional erystal- 
lization, of metallic salts (e. g. copper, lead, zinc), or of organic de- 
rivatives. These methods are, however, really preparation methods 
as distinguished from analytical methods and the yields are often no- 
tably poor. The inevitable errors introduced in the ester distillation 
procedure were discussed by Osborne’ many years ago and are well 
recognized. 

The inclusion of the iodine-containing amino acids thyroxine and 
diiodotyrosine in Group A will also be readily understood, in spite of 


6 Osborne, T.B., & Jones, D.B. Am. Jour. Physiol. 26: 305. 1910, 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 105 


the facts that Brand and Kassell5® have recently developed new 
methods for them and that several other methods have been described. 
Although much labor has been expended on thyroxine isolation from 
thyroid gland tissue, and the yields obtained from glands of different 
sources are a matter of interest because of the physiological and 
pharmacological uses, there is very little information available on the 
exact thyroxine content of purified preparations of the protein thyro- 
globulin. For diiodotyrosine (iodogorgoic acid), which is also a com- 
ponent of thyroglobulin,® there is similarly little more than qualita- 
tive information. 

The second group of amino acids in the table (Group B, TABLE 2) is 
debatable ground. Glycine, alanine, proline, oxyproline, leucine, and 
phenylalanine until recent years could be separated only by the 
Fischer procedure and, although much valuable data was secured, the 
results can only, even under the best conditions and with the most 
careful attention to details, have been approximations. There is no 
way in which one can calculate what the error in these determinations 
may have been. Doubtless many of the values in the literature ap- 
proach the truth quite closely; others, however, must be much too 
low and some may even be too high. 

These early data served to discriminate between various proteins in 
that they showed, for example, that prolamines were very low in 
glycine and high in proline, but they could have no possible significance 
for the calculation of molecular ratios between different amino acids 
nor for the calculation of minimal molecular weights of the protein. 

In the past few years, the analytical chemistry of this group of 
amino acids has been placed upon a much more encouraging footing 
by the investigations of Bergmann. It is not too much to hope that 
his new methods, when generally applied to a wide range of purified 
proteins, may go far to provide the solution of the problem of the 
analytical chemistry of this group of substances. 

The method for glycine™ involves the precipitation of the complex 
compound of potassium trioxalato chromiate, [Cr(C20.)3]K3 + 3H20, 
that is formed when the reagent is added, together with two volumes 
of alcohol, to the solution that contains glycine. Under carefully 
controlled conditions of temperature, acidity and time, a constant 
proportion of the glycine (about 88 per cent) is precipitated. With 
the use of this value, the quantities of glycine precipitated from a 

59 Brand, E. & Kassell, B. Jour. Biol. Chem. 131: 489. 1939. 

6oFoster,G.L. Jour. Biol. Chem. 83: 345. 1929. 


6: Bergmann, M., & Fox, 8. W. Jour. Biol. Chem. 109: 317. 1935. Bergmann, M., 
& Niemann, C. Jour. Biol. Chem. 122: 577, 1937. 


106 ANNALS NEW YORK ACADEMY OF SCIENCES 


protein hydrolysate are corrected and an estimate is made of the 
glycine yielded. 

The determination of alanine requires the previous removal of 
glycine and involves a series of small scale determinations of the nitro- 
gen precipitated by dioxalatodipyridinochromiato acid® (dioxopyridic 
acid), [Cr(C204)2.(CsHsN)2].H in order to find the proper quantity 
of reagent to add in order to precipitate the alanine. The weight of 
the salt is corrected for the small solubility to obtain the yield of 
alanine. 

To determine proline, advantage is taken of the insolubility of 
the proline compound of tetrathiocyanato-dianilidochromiato acid, 
[Cr(CNS)4(CeHs.NH2)2].H (rhodanilic acid). Again a solubility 
correction must be ascertained and applied. 

From the filtrate from the proline compound, hydroxyproline may 
be precipitated by the addition of Reinecke salt, [Cr(CNS)4(NHs) 2]. 
NH,, together with pyridine. Purification of the hydroxyproline is 
difficult and the method is probably considerably less satisfactory as 
an analytical procedure than that employed for the other amino acids. 
Nevertheless higher yields, as compared with the results of other 
methods, were obtained. 

A consideration of the difficulties and uncertainties involved in these 
precipitation methods—particularly with respect to the incompleteness 
of precipitation and consequent necessity for the use of solubility cor- 
rections—has led during the past two years to the development of 
methods of a new type in Bergmann’s laboratory." If to a solution 
of an amino acid a reagent is added in quantity less than sufficient to 
precipitate all the amino acid, a precipitate will form that contains, 
when equilibrium is reached and on the simplest assumption of binary 
salt formation, equimolecular quantities of the reagent and the amino 
acid. The mother liquor will contain a certain concentration of the 
ions of both reagent and amino acid. If a known amount of reagent 
has been employed, the concentration of the reagent left in the mother 
liquor can be calculated and an equation set up for the solubility 
product of the two ions. If a second experiment with a different 
amount of reagent is then carried out, and the solubility product cal- 
culated, these two results can be equated. In the resulting expression, 
the only unknown is the molar concentration of the amino acid in the 
solution at the beginning. This calculation assumes that molar con- 

6: Bergmann, M. Jour. Biol. Chem. 122: 569. 1938. 

% Stein, W. H., Niemann, C., & Bergmann, M. Jour. Am. Chem. Soc. 60: 1703. 1938. 


Bergmann, M., & Stein, W.H. Jour. Biol. Chem. 128: 217. 1939. Ing, H. R., & Berg- 
mann, M. Jour. Biol. Chem. 129: 603. 1939. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 107 


centrations can be substituted for ionic concentrations, that activity 
coefficients do not change during the experiment, and, for success, re- 
quires rigid control of the physical conditions under which the experi- 
ment is conducted. The great advantage is that no knowledge of the 
solubility of the amino acid compound is required and, accordingly, 
the insidious errors are avoided that may creep in when the solubility 
of a compound in water is used as a correction for the solubility of the 
same compound in a solution of amino acids. The justification of the 
method is that, in spite of the assumptions involved, quantitative re- 
coveries of amino acids, either in pure solution or in mixtures, can be 
readily secured. 

New data have already been obtained in several cases by this method. 
For example, the glycine content of gelatin, previously given as 25.5 
per cent, is now found to range from 26 to 27 per cent according to 
the source of the protein preparation. The proline, formerly 19.7 per 
cent, is now 17.5 per cent with a precision of about 2.5 per cent of this 
value. The revision downwards by the new method furnishes an il- 
lustration of the effect of errors in the earlier method. 

In order conveniently to apply this method, Bergmann and Stein" 
have proposed an entirely new reagent for certain amino acids, namely 
naphthalene-8-sulfonic acid, the salts of which are referred to as 
nasylates, to distinguish them from the naphthylenesulfonyl deriva- 
tives. The selection of this substance, which yields salts that are only 
moderately insoluble, is possible because a definite and appreciable 
solubility is a distinct advantage in the application of the new solubility 
product method and simple binary salt formation with the amino acid 
is desirable. With this reagent, the determination of leucine, arginine, 
and probably other amino acids, is possible. The salts of isoleucine 
and valine are more soluble than those of leucine and, accordingly, 
these amino acids do not interfere with leucine determinations in 
proteins. The determination of phenylalanine with this new reagent 
or some substance of analogous properties also appears to be a definite 
possibility. 

As yet this method has not been applied to more than a few pro- 
teins. But the accuracy with which solutions of amino acids, or simple 
mixtures of amino acids can be analyzed suggests that most valuable 
results are to be anticipated. 

Dr. Bergmann kindly permits me to say at this point that an ex- 
tensive series of sulfonic acid reagents is being investigated and that 
suitable reagents for many specific precipitations have been found. 


¢« Bergmann, M., & Stein, W. H. Jour. Biol. Chem. 129: 609. 1939. 


108 ANNALS NEW YORK ACADEMY OF SCIENCES 


The third group of amino acids (Group C, TABLE 2), which contains 
those substances for which analytical methods of some accuracy exist, 
also furnishes material for considerable debate. However one may 
answer the question, what is the best method to determine cystine, one 
is certain to encounter argument, and since this problem, and the 
allied problems that concern methionine, tyrosine, and tryptophane 
have been discussed at great length in the literature of the past few 
years, itseems scarcely necessary to go into it at all deeply in this place. 
One is perhaps justified in setting up as a criterion for judgment among 
the various methods the requirement that they should yield sub- 
stantially the same result when careful comparative tests are con- 
ducted. Hess and Sullivan® have recently determined the cystine 
content of several proteins by five different current methods, those of 
Sullivan, of Okuda, of Shinohara, of Folin and Marenzi, and of Vickery 
and White. Comparison of the results shows a remarkably close 
agreement for four of the five methods, only that of Folin and Marenzi 
being significantly different. Such results give a considerable degree 
of confidence in the accuracy of each of these four methods and suggest 
that choice among them can properly be made on grounds of con- 
venience or personal preference. Other similar comparisons have been 
made in Chibnall’s laboratory by Bailey.® 

The two dicarboxylic amino acids have received far less attention 
than most of the others. The methods employed have been gradually 
developed from the early purely preparation type of procedure and 
have recently been considerably refined in Chibnall’s laboratory. 
There is still room for much improvement, however, and it is doubtful 
if most of the data in the literature are as satisfactory as the results 
for cystine, methionine, tyrosine, and tryptophane. 

Of the three basic amino acids, arginine can now be determined 
with great accuracy. There are a number of methods available, each 
with advantages for certain purposes. A careful comparison of recent 
results in my laboratory of a new modification of the flavianic acid 
method®’ with published values by other methods suggests a definite 
superiority for the new method on grounds both of accuracy and of 
precision. Nevertheless the agreement of several of the other methods, 
notably the beautiful enzymatic method of Hunter, with the results 
of the new procedure indicates that there is latitude for choice. 

Histidine and particularly lysine, however, are in a less favorable 


6s Hess, W. C., & Sullivan, M. X. Jour. Biol. Chem. 128: 93. 1939, 
6¢ Bailey, K. Biochem. Jour, 31: 1396. 1937. 
8? Vickery, H. B. Jour. Biol. Chem. 132: 325. 1940. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 109 


position. There are several procedures whereby these substances 
may be determined and, a year ago, I should not have hesitated to 
claim that the large-scale silver precipitation method, originally de- 
vised by Kossel and more recently modified in my laboratory, yields 
results that are highly trustworthy. Today I am not so certain. 
By the use of the new flavianic acid procedure, appreciably higher 
results for arginine have been secured in many cases than we were 
able to get by the silver precipitation method and it is only natural 
to inquire whether renewed attention to specific methods for histidine 
and lysine may not likewise result in a marked improvement. For 
the present, then, I can only say that the results of the large-scale 
silver precipitation method are probably the most trustworthy avail- 
able, but I doubt if they can be regarded as completely satisfactory 
until new methods have been devised and the data have been con- 
firmed. 

To sum up what may be regarded as a conclusion with respect to 
the trustworthiness of analytical data on proteins, I think that we 
have some reason to be content with many of the figures for cystine, 
tyrosine, tryptophane, methionine, and arginine. There are a few 
values for glutamic and aspartic acid worthy of serious consideration 
and some of the data for histidine and lysine are probably quite close 
to the truth. Doubtless, when Bergmann’s new methods have been 
more fully developed, we shall have valuable data for several of the 
mono-amino acids, but as yet we are awaiting general application of 
these methods to proteins that can be regarded as homogeneous 
chemical entities. 

This conclusion may seem unduly pessimistic to some, but I 
sincerely feel that truly accurate values for a few amino acids in a 
protein that has been shown by physico-chemical methods to be a 
substance rather than a mixture are of greater value for our under- 
standing of the complexities of the protein molecule than is a mass 
of approximations. 


APPLICATION OF ANALYTICAL RESULTS 


Analytical data for the amino acids derived from proteins may 
be roughly assigned to two categories. There is a relatively large 
group of determinations obtained by Fischer, Osborne, and Abder- 
halden and their respective associates, and by Jones, Dakin, and a 
few subsequent workers in which the object was to account for as 
large a part of the protein molecule as possible. Compilations of 


110 ANNALS NEW YORK ACADEMY OF SCIENCES 


data for many proteins can be made today which add up to a very 
satisfactory total and, in the cases of zein, gelatin, and a few others, 
this total accounts for 90 per cent or more of the molecule. This 
information is of great value even though we know that many of 
the individual determinations, such as those of glycine, alanine, 
valine, leucine, etc., do not have a high degree of probable accuracy. 
Numerous attempts have been made to calculate molecular ratios 
from such data, with a moderate degree of success, but there is still 
so much doubt about the accuracy of the details that the results 
can be regarded only as approximations. 

In another category, however, are the results of the determinations 
of the amino acids in Group C in TaBLE 2. For most of these sub- 
stances the methods permit a series of check determinations without 
the expenditure of too much labor and an estimate of both accuracy 
and precision becomes possible. Accordingly the data can properly 
be employed to calculate molecular ratios and minimal molecular 
weights, 

In the following tables* (TaBLES 3 to 7) a number of these calcu- 
lations are presented. The data that have been selected are taken 
from studies that show evidence of unusual care and attention to 
controls. In most cases the values are averages of several or of 
many separate determinations and, in their selection, consideration 
has been given to the results of other investigators who have obtained 
values of a closely agreeing magnitude. It is not suggested that these 
are the best values, that being largely a matter of opinion, nor are 
they invariably the highest values; they are simply the results of 
unusually careful analytical studies. 

Many methods of presenting such data are possible but, in the 
following tables, since the emphasis is to be placed upon the 
analytical aspect of the problem, the first five columns give the cal- 
culations of the minimal molecular weight of the protein derived from 
each item. This minimal molecular weight is divided into the value 
assigned to the true molecular weight, obtained from physico-chemical 
measurements, and the nearest integer to the quotient is given in 
Column 6. Multiplication of the figures in Columns 5 and 6 gives 
the calculated molecular weight in Column 7, and the result is calcu- 
lated as a percentage of what may be called the “physical” molecular 
weight in Column 8. From this figure the agreement of the calcu- 
lated “chemical” molecular weight may be easily appreciated. 

* In preparing TABLEs 3 to 7 inclusive, I have had the assistance of comprehensive tables 
of protein composition compiled by Dr. E. J. Cohn. He has kindly placed manuscript 


copies of these tables at my disposal and they have saved many hours of tedious search of 
the literature. It is a pleasure to express my sincere thanks for this help. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 111 


TABLE 3 
HEMOGLOBIN (HorsE) 
Assumed Molecular Weight 66700 


Moles Assumed Cal- 
per Minimal number of culated Percentage 

Observed Author- gm. molecular atoms or molecular of 
per cent ity xX 10-5 weight residues weight 66700 

1 2 3 4 5 6 7 8 
Tron 0.335 1 5.99 16660 4 66640 99.9 
Sulfur 0.390 1 12.17 8220 8 65760 98.6 
Sulfide sulfur 0.191 2 5.96 16780 4 67120 100.7 
Sulfur 0.57 3 17.8 5623 12 67480 101.2 
Arginine 3.59 4 20.62 4849 14 67890 101.8 
Histidine 7.64 5 49,26 2030 33 66990 100.4 
Lysine 8.10 5 55.44 1804 37 66750 100.1 
Tyrosine 3.15 6 17.4 5749 12 68980 103.4 
Tryptophane 1.28 6 6.27 15940 4 63760 95.6 

Cystine 0.41 7 1.71 58590 1 58590 88 
Aspartic acid 8.9 8 66.9 1496 45 67320 100.9 
Glutamic acid 6.3 8 42.8 2335 29 67710 101.5 
Amide N 1.01 8 72.1 1387 48 66580 99.8 


If tryptophane is 1.22%, millimols per gm. = 5.98; min. molecular weight = 16730; for 4 
mols 66920 

If cystine is 0.36%, millimols per gm. = 1.49; min. molecular weight = 66730; for 1 mol 
66730 

If cystine is 0.72%, millimols per gm. = 2.99; min. molecular weight = 33370; for 2 mols 
66740 


1 Zinoffsky, O. Zeit. physiol. Chem. 10:16. 1886. 

2Schulz, F. Zeit. physiol. Chem, 25:16. 1898. 

tValer, J. Biochem. Zeit. 190: 444. 1927. 

4Vickery, H.B. Jour. Biol. Chem, 132: 325. 1940. 

5 Vickery, H. B., & Leavenworth, C.S. Jour. Biol. Chem. 79: 377. 1928. 
6 Folin, O., & Marenzi, A.D. Jour. Biol. Chem. 83: 89. 1929. 

7 Vickery, H. B., & White, A. Jour. Biol. Chem. 99: 701. 1933. 

8 Chibnall, A. C., & Bailey, K. Personal communication. 


It is obvious that when the assumed number of residues or atoms 
in Column 6 is small, the probable validity of the calculated “chemi- 
cal” molecular weight is high. In this case the accuracy of the 
analytical value receives its sharpest test and at the bottom of 
TABLE 3 a calculation is given which shows how important the second 
place of decimals in the tryptophane and cystine values may be. 

On the other hand, when the number in Column 6 is large, there is 
a considerable element of uncertainty in the validity of the agree- 
ment between the “chemical” and the “physical” molecular weights. 
Frequently a neighboring integer may equally well be chosen. 

_ If it were possible to assign weighting factors to these several ana- 
lytical results, one might calculate a molecular weight that would 
express the sum total of our best knowledge derived from analytical 
chemistry. At the moment, however, there is no objective way in 


112 ANNALS NEW YORK ACADEMY OF SCIENCES 


which this can be done, although it is obvious that some of the results 
are more highly significant than others. 


Hemoglobin 


The iron value of Zinoffsky has been repeatedly confirmed, no- 
tably by Valer in recent years, and it is from this value that the 
assumed molecular weight is calculated (TaBLp 3). The physico- 
chemical results of Adair and of Svedberg are in very close agreement. 

Zinoffsky’s sulfur value differs from that of Valer in a most inter- 
esting way. The analytical methods employed were widely differ- 
ent; Zinoffsky used an oxidation procedure while Valer employed 
the reduction method of ter Meulen in which the protein is heated in 
an atmosphere of hydrogen and the gases are passed over a platinum 
catalyst, sulfur being determined as hydrogen sulfide by iodine titra- 
tion. Valer’s values were obtained on the hemoglobin of individual 
animals and, with the dog, he found evidences of definite differences 
that he suggested might have a genetic origin. Whether or not this 
is the explanation of the results for the horse is a problem that awaits 
study. From the analytical standpoint it is important to note that 
both values fit well with the rest of the data. 

The chemical meaning of the sulfide sulfur determination is also 
still unexplained. The value of Schulz given in the table is identical 
with Osborne’s and it is significant that, in this case, it is twice as 
great as the cystine sulfur. In other cases (TABLE 7), sulfide sulfur 
agrees with the cystine sulfur and the result in the present case 
suggests that the Vickery and White cystine value may be consider- 
ably too low. 

Stern, Beach, and Macy®® have recently studied the globins of 
several species, cystine being determined by the polarographic method 
of Brdicka. The result for horse globin was 0.77 per cent, and they 
report a value of 0.81 per cent by Graff, Maculla, and Graff’s®® modi- 
fication of the Vickery and White method. On the assumption that 
horse hemoglobin contains 2 per cent of haem, the polarographic 
value for cystine would be 0.75 per cent and the calculations at the 
bottom of TABLE 3 show that this agrees well with the assumption 
that horse hemoglobin contains 2 moles of cystine per molecule. Fur- 
thermore such an assumption would bring the value for sulfide sulfur 
in line with observations on other proteins. Uncertainty still exists, 
however, with respect to the total sulfur content of horse hemoglobin, 


88 Stern, A., Beach, E. F., & Macy, I. G. Jour. Biol. Chem. 130: 733, 1939. 
6° Graff, S., Maculla, E., & Graff, A.M. Jour. Biol. Chem. 121: 81, 1937. 


THE COMPOSITION OF PROTEIN MOLECULES 113 


VICKERY 


“SE6T 


*LOF $ZET WoO “lord “anor 


"p ‘spomg F CH ‘WosTAA “y ‘er0gg : 


‘2061 “OFT: PZ 909 ‘MIEYO ‘Uy "Mor “g ‘J, ‘eMIOGSO » "6Z6L 6828 "WOU ‘lola “moe ‘q “y ‘Tzuerey, PF “CO ‘UOT e 

‘T86E “19:76 “WOYO “Ol “anor °CO “HT ‘AIOATBD s ‘E61 “LOTT :9% ‘anor ‘meydorg 9 “y ‘el0ys F “gq ‘H ‘AIONOIA< 

‘SE6L “SFI SGZT ‘moyO ‘lolg@ “moe -q ‘puzig y “gq ‘TOSsty» ‘OF6I +="SZE:ZET ‘Mego ‘ora “Mog ‘“g ‘Hy ‘AIOHIIAr 
6° O0L OGE0F 86 O01 OL09€ GG Evrl €°69 L 26°0 weZ0IIN oprary 
6°46 OLT6E 9 8° 80T OLI6E 9 8699 o& ST 9 16r'0 ammjns spylng 
0°08 OL0ZE T 8°88 OLOZE T OL0ZE Shae 9 460°0 snioydsoyg 
T 66 0996 0G 166 069SE 81 €86T €r 0S 9 919'T anjing 
¢ €or OZFIF ST 9°66 068SE (Bil T9LZ 09° SP g 20°9 pros onzedsy 
6°66 O866E 8& €°66 O9LSE & “6SOT e1'S6 9 96° §T prow orareynyy) 
8°66 OV66E ial 0° S0T O60LE &1 ES86 gOS y €Z°¢ OUIUOIY}O TAT 
T dor O0Z80F 9 Go ¥6 OZ0FE g £089 L141 y 8ZL°T auteys£—) 
€° 101 OOSOF (5 9 ZIT 00S0F € OOSET Tvl y SLT euryshD 
G'r8 00948 z 1°96 009F8 (6 O0E8LT 81'S e SIT ouvydoyd Ary, 
9°Z01 OSOTT 6 ¥ TOL O06F9E 8 T9SF £616 £ 16°¢ oulsorAy, 
6° Z0T O9TIP ial Z 001 OFZ9E GL OF6G 60° FE z 16°F ouIsA'T 
€° Zor 09607 v 6 SIT 09607 v OFZOT 992'6 z 8P'T SUrpr}sty 
6 66 0L66€ €T ¢ Zor 0069 rai ¢GLOE GS CE I 99°¢ oulmig1y 

IT Or 6 8 )4 9 ¢ v € G li 
00007 qysiem | sonpiser | 00098 qysiem =| senpiser 
jo Ie[noe[our | 10 sul0ze jo IeMos[Oul | 10 SuI0}e qysiom 01 X 
93%}099 perry jo ‘ou 93e%}u90 per] jo-ou | i, m af ae ae yt queso sad 
-18g bl 0) 1:7) peumnssy “18g i 113) :1@) peunssy yeur nod -royjny | pearesqo 
=I sajo 
0000F 0009 Sal = 


qYySIOM IeTnosjour poumnssy 


0000F PU NONE FqZIEA\ reMoeToPy pouNssy 


qYSIOM IefMosjour peumnssy 


(Na) NONMATY 9D 


7 TIA, 


114 ANNALS NEW YORK ACADEMY OF SCIENCES 


as well as the cystine content and it is obvious that further study of 
the sulfur and the sulfur-containing amino acids of hemoglobin ob- 
tained from a known source is needed. 

The values for the bases and the dicarboxylic acids require little 
comment. Of these the arginine is known with the greatest cer- 
tainty and the proportion present is such that 14 may be selected for 
the number of residues with some confidence; neither 13 nor 15 fits 
the data at all well. For the others the choice is not so clearly de- 
fined. The chief significance of the amide nitrogen values is to show 
that both aspartic and glutamic acids must be present in part as 
their respective amides. 


Egg Albumin 


The calculations for egg albumin (Tasim 4) introduce a further 
complexity inasmuch as the molecular weight derived from physico- 
chemical considerations is still uncertain. The value 36000 is indi- 
cated by osmotic pressure and by the early ultracentrifuge data; 
later figures of Svedberg indicate a value near 40000. Calculations 
have accordingly been made on both bases and, in general, the agree- 
ment of the analytical data is better when the higher molecular 
weight is chosen. Furthermore the higher value brings the cystine 
and methionine values of Kassell and Brand in line with Osborne’s 
very carefully determined total sulfur, and the cystine also agrees 
with the sulfide sulfur, although whether or not this is evidence is 
still debatable. Calvery’s phosphorus does not lead to a decision 
and obviously the phosphorus content as well as the cystine and 
tryptophane would repay further study. 

A recent paper of Bernhart?® gives new values for tyrosine (3.89 
per cent) and tryptophane (1.13 per cent) which agree closely with 
the earlier values of Folin and Marenzi used in TABLE 4. In addi- 
tion there is a new value for phenylalanine by the Kapeller-Adler 
method of 5.37 per cent. This value, if the method is assumed to be 
accurate, leads to a minimal molecular weight of 3147 and accord- 
ingly to estimates of a molecular weight of 37700 for 12 units and 
40900 for 13. Bernhart’s analysis of his own data and that of others 
led him to an estimate of 36900 as a probable molecular weight of 
egg albumin. 

Insulin 

The situation (TABLE 5) with respect to the zine content of crystal- 

line insulin is similar to that of the sulfur content of hemoglobin. 


70 Bernhart, F. W. Jour. Biol. Chem. 182: 189. 1940. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 115 


TABLE 5 
INSULIN 
Assumed Molecular Weight 35100 


Moles Assumed Cal- 
per Minimal number of culated Percentage 

Observed Author- gm. molecular atoms or molecular of 

per cent ity x 10-5 weight residues weight 35100 
Zinc 0.52 1 7.95 12570 3 37710 107.4 
Zinc 0.36 2 5.51 18160 2 36320 103.5 
Sulfur 3.34 3,4 104.2 960 36 34560 98.5 
Cystine 12.5 Ul 52.0 1922 18 34590 98.5 
Cysteine 12.6 8 104.1 961 36 34590 98.5 
Tyrosine 12.2 5 67.4 148.5 24 35640 101.5 
Amide N 1.55 6 110.8 903 39 35220 100.3 
Cadmium* 0.77 1 6.85 14600 2 29200 81.3 
Cobalt** 0.44 1 TAT 13400 3 40200 114.5 


eae ee ee 

* If Scott and Fisher’s lowest cadium value of 0.72 per cent is taken, the minimal molecular 
weight is 15610 and the calculated molecular weight for 2 atoms is 88.9 per cent of 35100. 

** Tf Scott and Fisher’s highest cobalt value of 0.46 per cent is taken, the minimal molecu- 
lar weight is 12810, and the calculated molecular weight for 3 atoms is 109.4 per cent of 
35100. ‘Their lowest value of 0.41 per cent calculated for 2 atoms gives 81.8 per cent. 

1 Scott, D. A., & Fisher, A.D. Biochem. Jour. 29: 1048. 1935. 

2 Cohn, E. J., Ferry, J. D., Livingood, J. J., & Blanchard, M. H. Science 90: 183. 1939. 

3 Miller, G. L., & du Vigneaud, V. Jour. Biol. Chem. 118: 101. 1937. 

4 du Vigneaud, V., Miller, G. L., & Rodden, C. J. Jour. Biol. Chem, 131:631. 1939. 

s du Vigneaud, V., Jensen, H., & Wintersteiner,O. Jour. Pharmacol. 32: 367. 1928. 

sdu Vigneaud, V. Cold Spring Harbor Symposia Quant. Biol. 6: 275. 1938. 

7Svedberg, T. Nature 127: 438. 1931. 


Scott and Fisher’s value fits well for 3 zinc atoms, that of Cohn and 
his associates fits even better for 2 atoms. Scott and Fisher’s cadmium 
and cobalt values in crystals obtained with these metals do not 
resolve the problem. However, the sulfur, cystine, and tyrosine 
values fit the requirements of the ultracentrifuge determinations of 
Svedberg very satisfactorily. 


Serum Albumin 


Serum albumin (TABLE 6) presents greater difficulties for discus- 
sion from the analytical point of view than the proteins hitherto men- 
tioned since there is every reason to believe, today, that what has 
usually been isolated as serum albumin is a mixture of proteins that 
differ from each other, for example, in tryptophane content. Early 
data on the tryptophane of serum albumin indicated 0.53 per cent.” 
This result was obtained from a sample crystallized twice by Wyman 
in Cohn’s laboratory. Another sample prepared by Ferry gave 0.52 
per cent of tryptophane. Hewitt’s more recent work on serum 
albumin” indicates that fractions can be secured of which the “purest” 


n Folin, O., & Marenzi, A.D. Jour. Biol. Chem, 83: 89, 1929, 
7 Hewitt, L. F. Biochem, Jour, 30: 2229. 1936, 


116 ANNALS NEW YORK ACADEMY OF SCIENCES 


yielded 0.26 and 0.30 per cent of tryptophane, while the albumin 
separated from the mother liquors gave as much as 1.0 per cent. On 
the other hand, the most recent work of McMeekin in Cohn’s labora- 
tory” has resulted in the crystallization of a carbohydrate-free serum 
albumin fraction of nitrogen content 16.2 per cent in the form of a 
sulfate and 16.8 per cent in the isoelectric condition. Hewitt’s material 
had only 14.1 to 14.4 per cent of nitrogen. McMeekin’s preparation 
yielded 0.53 per cent of tryptophane, by Lugg’s modification of the 
Folin and Marenzi procedure, and thus resembles the earlier prepara- 
tions of Wyman and Ferry from the same laboratory. Furthermore 
it was demonstrated to be essentially homogeneous in the Tiselius ap- 
paratus and to have a constant solubility. This provides a second 


TABLE 6 
Serum ALBUMIN (Horse) 
Assumed Molecular Weight 73000 


Moles Assumed Cal- 
per Minimal number of culated Percentage 

Observed Author- gm. molecular atoms or molecular of 

per cent ity XxX 10-5 weight residues weight 73000 
Tryptophane 0.26 1 1.275 12440 6 74640 102.2 
Tryptophane 0.53 2 2.597 38500 2 77000 105.4 
Tyrosine 4.79 : 26.4 3781 19 71840 98.4 
Amino N 1.07 1 76.4 1308 56 73250 | 100.3 
Cystine 6.04 3 25.1 3977 18 71590 98.0 


1 Hewitt, L. F. Biochem. Jour. 30: 2229. 1936. 
2 McMeekin, T.L. Jour. Am, Chem, Soc. 61: 2884. 1939. 
* Folin, O., & Marenzi, A.D. Jour. Biol. Chem. 83: 103. 1929. 


case of a protein of animal origin in which two widely different values 
for the same component seem equally well supported; the discrepancy 
between the two sulfur values of horse hemoglobin has already been 
mentioned. Valer has pointed out the possibility that the genetic 
origin of the hemoglobin of the dog may have significance in its effect 
upon the sulfur content. One may well ask if this may be a factor 
in the cases of the hemoglobin and the serum albumins of the horse. 

There is a greater degree of uniformity in the tyrosine values for 
serum albumin, the early ones of Folin-and Marenzi being 4.66 and 
4.77 per cent respectively. Hewitt’s purest fractions gave 4.74 and 
4.79 per cent as compared with earlier less carefully fractionated 
material of 4.7 per cent.74 The more soluble fractions, however, gave 
5.38 and 6.06 per cent. 


73 McMeekin, T. L. Jour. Am. Chem. Soc. 61: 2884. 1939. 
74 Hewitt, L. F. Biochem. Jour. 28; 2080. 1934. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 117 


TABLE 7 
EprstIn (HEMPSEED) 
Assumed Minimal Molecular Weight 56000 
Molecular Weight 304000 


Moles Assumed Cal- 
per Minimal number of culated Percentage 

Observed Author- gm. molecular atoms or molecular of 

per cent ity x 10-5 weight residues weight 56000 
Arginine 16.76 1 96.29 1039 54 56100 100.2 
Histidine 2.08 2 13.42 7456 8 59650 106.5 
Lysine 2.19 2 14,98 6673 8 53380 95.3 
Tyrosine 4.54 3 25.07 3988 14 55830 99.7 
Tryptophane 1.46 3 7.16 13970 4 55880 99.8 
Cystine 1.36 4 5.66 17660 3 52980 94.6 
Cysteine 1.36 4 11.23 8904 6 53420 95.3 
Methionine 2.39 5 16.02 6243 9 56190 100.3 
Glutamic acid 19.2 6 130.5 766 73 55920 99.8 
Aspartic acid 10.2 6 76.6 1305 43 56120 100.2 
Sulfur 0.880 7 27.46 3642 15 54630 97.5 
Sulfide sulfur 0.346 7 10.79 9264 6 55580 99.2 
Amide N 1.88 8 134.3 7450 75 55870 100.0 


1 Vickery, H.B. Jour. Biol. Chem. 132: 325. 1940. 

2 Vickery, H. B., & Leavenworth, C.8., Jour. Biol. Chem. 76: 707. 1928. 
3 Folin, O., & Marenzi, A.D. Jour. Biol. Chem, 83:89, 1929. 

«Bailey, K. Biochem. Jour. 31: 1396. 1937. 

+ Baernstein, H.D. Jour. Biol. Chem. 106: 451. 1934. 

6 Jones, D. B., & Moeller, O. Jour. Biol. Chem. 79: 429. 1928. 

7 Osborne, T.B. Jour. Am, Chem. Soc, 24: 140. 1902. 

s Osborne, T. B., & Harris, I. F. Jour. Am, Chem, Soc. 25: 323. 1903. 
*Svedberg, T. Ind. Eng. Chem., Anal. Ed. 10:113. 1938. 


Folin and Marenzi carried out cystine determinations on the crys- 
talline samples already mentioned finding 6.02 and 6.06 per cent 
respectively. Unfortunately no data for sulfur are given by Hewitt, 
nor by McMeekin. In a later paper, however, Hewitt’? states that 
the cystine content of his crystalbumin is 5.8 per cent which agrees 
well with the Folin and Marenzi value. 


Edestin 


The calculations on edestin (TaBLE 7) are introduced largely to 
show how the analytical approach to the determination of molecular 
weight, or rather to the selection of a most probable molecular weight, 
breaks down when a protein of high molecular weight is taken. The 
best that can be done is to assume an arbitrary value that conforms 
closely with some of the data and to develop the consequences of the 
choice. In the present case, the tryptophane value is taken as correct 
and it is assumed that the molecular weight is a multiple of 14000. 


75 Hewitt, L. F, Biochem. Jour. 31: 360. 1937. 


118 ANNALS NEW YORK ACADEMY OF SCIENCES 


In order to accommodate the cystine value, 56000 must be selected 
and the table is constructed on the assumption that this is a sub- 
multiple of the true molecular weight. Five times this is 280000, 
six times is 336000, and neither of these agrees very well with Sved- 
berg’s ultracentrifuge result. 

The selection of analytical data is fairly easily made for this 
protein since it has repeatedly been carefully studied. The tyrosine 
value, 4.54 per cent, was obtained by Folin and Marenzi on a sample 
recorded as “not crystallized.” This, however, does not necessarily 
connote a lack of purity. A crystalline sample obtained from Osborne 
gave 4.28 per cent tyrosine. Other values in the literature, however, 
are 4.58 per cent™® and 4.53 per cent’’ and the weight of evidence 
therefore favors the higher result of Folin and Marenzi. These 
same workers have also obtained tryptophane values that agree well. 
Folin and Marenzi got 1.45 and 1.46 per cent on their two prepara- 
tions, Looney got 1.52 per cent, and Folin and Ciocalteu 1.51 per cent. 

The cystine value of 1.36 per cent obtained by Bailey, in the 
course of a careful comparison of the results of various cystine 
methods on this protein, is supported by Folin and Marenzi’s value 
of 1.35 per cent, Sullivan and Hess’ (iodometric) value of 1.32 per 
cent, and is only a little higher than Vickery and White’s value of 
1.25 per cent by the cuprous mercaptide method. 

The methionine value of Baernstein, obtained by the titration of 
homocysteine thiolactone, was confirmed almost precisely (2.38 per 
cent) in a later paper in which the volatile iodide method was used. 
Bailey has also obtained a value of 2.3 per cent by the volatile iodide 
method. None of these values contains the corrections estimated 
from recovery experiments by Kassell and Brand. 

The sulfur value of Osborne falls in line with these results for 
cystine and methionine and his sulfide sulfur again agrees quite well 
with the cystine. The amide nitrogen value indicates that all of the 
glutamic acid may be present as glutamine in this protein and all of 
the aspartic acid may be free. That this is not the case is shown by 
Damodaran’s isolation of asparagine from edestin.®5 


SUMMARY 


The view that amino acids are the proximate components of the 
protein molecule was first clearly expressed by Ritthausen. The 
implications of this idea were explored by Kossel, Fischer, and Hof- 


76Looney, J. M. Jour. Biol. Chem, 69: 519. 1926. 
17 Folin, O., & Ciocalteu, V. Jour. Biol. Chem. 73: 627. 1927. 


VICKERY: THE COMPOSITION OF PROTEIN MOLECULES 119 


meister but were largely ignored by Kithne and his pupils. During 
the past one hundred and twenty years, more than forty different 
substances or preparations have been isolated from protems and 
claimed, with more or less evidence, to represent individual amino 
acid units of which the protein molecule is formed. These are listed 
in TABLE 1 and are classified into four groups. Highteen of them are 
universally recognized as common protein constituents, and seven 
more occupy a somewhat special position, either by reason of their 
limited distribution or because certain details of their mode of com- 
bination in the protein are not understood. 

In addition there are five known amino acids, found in plant tissues 
and, in a few cases, in animal tissues which are by no means im- 
probable, though doubtless rare, constituents of proteins although 
this has not yet been shown. Some seventeen other substances have 
been at one time or another reported as protein constituents but 
without wholly satisfactory evidence. 

The well known protein amino acids are classified in TABLE 2 in 
accordance with our knowledge of their analytical chemistry. For 
eight amino acids there are at present no dependable analytical 
methods whatever; knowledge of most of them rests entirely on rela- 
tively crude isolation methods. For an additional group of six mono- 
amino acids, fairly good preparation methods exist, but these are far 
from quantitative. The current researches of Bergmann are, how- 
ever, revealing methods by which these substances can be deter- 
mined with a high degree of probable accuracy and satisfactory de- 
terminations of them in a series of proteins are to be looked for in the 
near future. 

A final group can be formed of those amino acids for which colori- 
metric or isolation methods of high probable accuracy have been de- 
veloped. It is with these substances alone that trustworthy molecu- 
lar ratios and minimal molecular weights can be calculated at the 
present time. 

In TaBLEs 3 to 7 are presented calculations of the amino acid com- 
position of five proteins which have been shown to be nearly, if not 
entirely, homogeneous chemical substances. Selections of data for 
these calculations have been made from papers which contain evi- 
dence that unusual care has been exercised in the effort to secure ac- 
curate results. In many cases the data selected have been con- 
firmed by other methods or by other investigators. The results of 
the calculations of the minimal molecular weights, in combination 
with molecular weights derived from physico-chemical considera- 


120 ANNALS NEW YORK ACADEMY OF SCIENCES 


tions, permit the selection of most probable integers for the number 
of amino acid residues per molecule. A picture of the composition 
of the protein molecule can thus be drawn, and from the agreement, 
or lack of agreement, of the molecular weights calculated from the 
analytical data with the value secured by physical methods, an esti- 
mate of the trustworthiness of both chemical and physical data can 
be made. It is found that a surprising number of analytical deter- 
minations fit well with each other and provide valuable cumulative 
evidence of the validity of the assumptions upon which these calcu- 
lations rest. 

Full advantage is also taken of the cases, such as hemoglobin and 
insulin, where a metal atom forms one of the units of the structure, 
and the relationships of the two known sulfur-containing amino acids 
with the total sulfur are also shown to be satisfactory in most cases. 


EVIDENCE FROM PHYSICAL CHEMISTRY RE- 
GARDING THE SIZE AND SHAPE OF PROTEIN 
MOLECULES FROM ULTRA-CENTRIFUGATION, 

DIFFUSION, VISCOSITY, DIELECTRIC DIS- 
PERSION, AND DOUBLE REFRACTION 
OF FLOW 


By J. L. ONcLEY 


From the Department of Physical Chemistry, Harvard Medical School, 
Boston, Massachusetts 


The estimation of size and shape of protein molecules has been at- 
tempted by many kinds of physico-chemical measurements. Nearly 
all of these methods have both experimental and theoretical uncer- 
tainties. In the attempt to extend the data so obtained to what can 
well be called the second approximation, some of the uncertainties 
have perhaps been forgotten in our enthusiasm. In this paper we 
shall try to discuss the major uncertainties, but shall avoid any un- 
necessary discussion of experimental techniques. We shall further 
confine our discussion as much as possible to cases where only one 
protein component is present, neglecting the interactions of one protein 
on another, and the problems involved in the dissociation or associa- 
tion of protein molecules. 

The very fact that these second approximations are under dis- 
cussion here is perhaps the highest tribute that we can pay to Pro- 
fessor The Svedberg and his associates, to whom we are largely in- 
debted for many techniques' which were in the form of preliminary 
investigations only fifteen years ago. 


SEDIMENTATION EQUILIBRIUM 


From a theoretical standpoint, the sedimentation equilibrium 
method of Svedberg is perhaps the most desirable method for the 
evaluation of molecular size.2 It involves the measurement of the 
protein concentration, c, as a function of the distance from the center 
of rotation, z, when an equilibrium between centrifugal and diffusion 


1Svedberg, T., & Pedersen, K. O. ‘The Ultracentrifuge’’. Oxford Univ. Press 
London. 1940. 

2Svedberg, T. Zeit. physik. Chem.121:65. 1926. Tiselius, A. Zeit. physik. Chem. 
124: 449. 1926. Pedersen, K.O. Zeit. physik. Chem. A170:41. 1934. See reference 1, 
pp. 48-57. 


(121) 


122 ANNALS NEW YORK ACADEMY OF SCIENCES 


forces is set up. The relation between these quantities, the speed 
of rotation, w, the partial specific volume, 5, and the molecular weight, 
M,, can be shown thermodynamically to be independent of the shape 
and hydration of the protein, and given by the equation 


2 RT In ¢9/e1 
io (1 — de) w?(xo? — 21?) 


Here F is the gas content (8.313 < 107 ergs per mole degree C.), T, 
the absolute temperature, and p the density of the solution. This 
method has been used by Svedberg and his co-workers for the deter- 
mination of the molecular weights of most of the substances which 
we intend to discuss in this symposium, and it is difficult for us 
to evaluate the probable accuracy of the determinations. A discus- 
sion of the possible sources of error, however, will be of some use in 
weighing evidence from this source. 

Probably the most obvious source of error is the possibility of not 
having set up a true equilibrium between the centrifugal and dif- 
fusion forces. For molecules of the size of those reported here this 
requires times ranging from several days to perhaps a week. This 
is usually tested by taking a series of pictures at 12- or 24-hour in- 
tervals, until perhaps three consecutive pictures give identical results 
within the expected experimental error. The possibility exists, how- 
ever, that this state may not be one of true equilibrium, since the 
additional effects of thermal gradients and vibration may be affect- 
ing the results. Svedberg and the other workers in this field have 
taken great pains to guard against this possibility. 

Another possible source of error involves the possibility of some 
change taking place in the protein molecules because of the long 
times involved in these measurements. The necessity for the material 
to remain at 20° or 25° C. for prolonged periods without any change 
restricts this method to only the most stable of the proteins. 

The other sources of error are the limiting accuracies with which 
the concentrations can be measured in the ultracentrifuge, the con- 
stancy of the speed of the rotor, and the values used for the partial 
specific volume of the protein, . We will reserve the discussion of 
this last point till later. The error in evaluating the concentration 
as a function of the distance is difficult to ascertain. It would 
obviously depend a great deal upon the method used; those in common 


(1) 


3Mason, M., & Weaver, W. Phys. Rev. 23: 412. 1924. Weaver, W. Phys. Rev. 
27: 499. 1926; Zeit. physikal. Chem. 43: 396. 1927; 49: 311. 1928. Archibald, W. J. 
Phys. Rey. 53: 746. 1938; 54: 371. 1938. See reference 1, p. 56-57. 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 123 


use are the scale-displacement method of Lamm, the slit method of 
Toepler, and the light adsorption method of Svedberg. The first of 
these is probably the most accurate for this purpose, and the most 
widely used at present. The experimental errors involved in com- 
paring and calculating concentrations from the scale-displacement 
method are usually of a slightly larger magnitude than those to be 
discussed in the study of diffusion constants, and could readily 
amount to perhaps 5%.4 Measurements are usually recorded by 
merely giving values of the molecular weight as calculated in differ- 
ent portions of the cell, values of x, and x2 being chosen at 0.5 mm. 
intervals. The parts of the cell nearer the center, where c changes 
very slowly, are poorly suited to this means of calculation and are 
often omitted. The practice of plotting log c against x? would seem 
a more fitting one,® the curve so obtained being linear if the solution 


is monodisperse. The slope would be equal to 
M,.w?(1 — de)/4.606 RT. 


When the system under analysis is polydisperse, the plot of log c 
against x? is not linear.6 It has been shown that the values for the 
molecular weight calculated at various values of x in such a system 
increase as x increased. The absolute values obtained depend to a 
large extent upon the method used for the calculation. Three dif- 
ferent ways of averaging the weight at any values of x are in use: 
(1) the number-average (M, = 3,;(n,M;) /3in;) ; (2) the weight-aver- 
age My = 3i(mM2)/3i(nM;) = 3i(wiMi); (3) the z-average M,= 
3;(nM8)/3i(niM?). Here nj; and wz represent the number and total 
weight of molecules of molecular weight Mi. Lansing and Kraemer? 
have discussed this problem very thoroughly, and give equations for 
the calculation of any of these quantities. The z-average is most 
easily obtained if the Lamm scale method is used, but weight-averages 


4 Lansing and Kraemer state’ that the errors, “rarely exceed 10%, except in extremely 
unfavorable cases.’’ Pedersen! (p. 305) states that “the accuracy of M determined from 
a single sedimentation equilibrium experiment cannot be put higher than M= (5-10%).” 

5 When values of M are calculated from equation (1) using equal 72 — 11 intervals, it 
can be rather easily shown that all but the initial and final concentrations cancel because 
of the logarithmic nature of the equation. (See Roseveare, W. E. Jour. Am. Chem, 
Soc. 53: 1651. 1931). This is mentioned by Pedersen (p. 305) when he says ‘when 
only the average M is wanted, it is sufficient to know the concentration at the top and the 
bottom of the cell, since all the intermediate values cancel out when the mean is taken. 
This is, however, a drawback, as just the concentrations at the top and at the bottom of 
the cell are the ones that are most uncertain, since these can only be found by extrapola- 
tion.” 

6 The polydispersity of a protein solution can be detected more readily by observations 
of the sedimentation velocity of the solution, as is discussed in a later section. 

7 Lansing, W. D., & Kraemer, E. O. Jour. Am. Chem. Soc., 57: 1369. 1935. See 
also reference 1 pp. 342-9. 


124 ANNALS NEW YORK ACADEMY OF SCIENCES 


can be obtained by means of a graphical integration of the z vs. x curve. 
Number-averages cannot, in general, be determined with the relia- 
bility of the other averages, but are extremely useful since it is this 
number-average which is to be compared with osmotic pressure 
methods. The calculation of a “non-uniformity coefficient” can be 
obtained from any of these averaging methods. This problem will 
not be further discussed here since all of the molecules under discus- 
sion here have been shown to be monodisperse. 

In some cases errors are introduced because of deviation of the 
protein from Henry’s law. By keeping the concentration of protein 
at low values it is found that this source of error would apply only in 
extreme cases, and probably does not introduce any appreciable error 
for the molecules under discussion. Very large degrees of asymmetry 
will cause deviation from Henry’s law to occur at lower protein con- 
centration. 

OSMOTIC PRESSURE 


Another method which can be derived along strictly thermodynamic 
lines® is through the measurement of the osmotic pressure of solutions 
under proper conditions. It has been shown by Sorensen, Adair, Burk 
and others® that such measurements of osmotic pressure can lead to 
reliable molecular weights when proper corrections for membrane 
effects are made, and when the observations are made in sufficiently 
dilute protein concentrations. 

In order to obtain such molecular weights, we must take account 
of (1) the “ion-difference pressure,” p;, (briefly described as the cor- 
rection term for the unequal distribution of diffusible ions); (2) the 
effect of forces between ions and molecules (usually represented by 
the osmotic coefficient, g); and (3) the volume of the hydrated pro- 
tein molecules. We must further assume that there is no change in 
the state of aggregation with concentration. For protein solutions, 
this last assumption is usually fulfilled, and the effect of factor (3) is 
small when we deal with fairly dilute solutions. By measuring the 
total osmotic pressure, x, over a wide range of protein concentrations, 
holding the electrolyte concentration (and pH) constant, we may 
often simultaneously eliminate the effects of the first three factors 
by extrapolation of z/C values to C—O (where C is the protein 
concentration in grams of dry protein per liter of solution). This 

s’Donnan, F. G., & Guggenheim, E. A. Zeit. physikal. Chem. 162A: 346. 1932. 
et ae Trans. Faraday Soc. 31:98. 1935. Donnan, F.G. Trans. Faraday Soc. 


*See Cohn, E. J. Chem. Rey. 24: 203. 1939 for references to and discussions of this 
work. 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 125 


necessitates making measurements at sufficiently low protein concen- 
trations so as to make the extrapolation to C — O as certain as pos- 
sible, which may be experimentally difficult if the effects of factors 
(1) and (2) are large. When dealing with proteins at or near their 
isoelectric point, the ion-difference pressure may be unimportant, but 
when this is not true, it may be independently evaluated by measure- 
ments of membrane potentials, and such measurements are especially 
valuable since they enable separate estimates of the effects of factors 
(1) and (2) to be made, and thus provide definite proof as to whether 
the simple formula 
M = RT(C/z)co (2) 

may be applied (which is possible only when g > 1 and p,— 0 as 
C—>0). The applicability of equation (2) can be further studied by 
making extrapolations of x/C to C — 0 for various electrolyte concen- 
trations; a value of M thus calculated which is independent of the 
electrolyte concentration would indicate that the equation could be 
safely applied.?° 

Measurements of this kind do not give any evidence concerning 
the mono- or polydisperse nature of the protein under investigation. 
In case the protein is polydisperse, then one obtains a “number 
average” molecular weight; that is, M,= 3i(mM;) /3imi, but for the 
substances discussed here this is probably not a serious problem since 
they are obtainable in monodisperse preparations of high purity. The 
molecular weights calculated from osmotic pressure by extrapolation to 
infinite dilution of protein are presumably not seriously affected by 
even large amounts of hydration. Deviation from a spherical shape 
also has little effect, except that the deviations from a linear relation- 
ship between the osmotic pressure and the concentration usually begin 
at lower values of the concentration the more asymmetric the molecule. 
This method suffers the same defect as does the sedimentation-equi- 
librium method when the time required to reach an equilibrium state 
is long. 

SEDIMENTATION VELOCITY 


The measurement of the rate of migration of the protein in a high 
centrifugal field is one of the most widely used methods for the 
evaluation of size. Its popularity is mainly based on the fact that 
such experiments can be performed in a relatively short time, and 
on the fact that probably this method is the most sensitive to a dis- 

10 Analytical proof that g—>1 when pi—0 and C0 for any arbitrary but constant 


electrolyte concentration might be possible. Seea discussion by Hartley, G. 8., & Donnan, 
F.G. Trans. Faraday Soc, 31: 106-108. 1935. 


126 ANNALS NEW YORK ACADEMY OF SCIENCES 


tribution of particle size, particularly when used with the newer re- 
fractive index methods of analysis of Lamm. 

The results obtained by this method are not thermodynamic in 
nature and are not immediately convertible to values of molecular 
weight. The result obtained is a “sedimentation constant,” s, defined 
as the rate at which the boundary moves under unit centrifugal field: 


$=—-— = (3) 


This sedimentation constant!2 so obtained can be converted into an 
estimated size by considering the frictional resistance to sedimentation 
as evaluated by Stokes’ law. For a spherical and unsolvated particle 
this resistance (on a molar basis) is: 


(4) 


30M 718 
fo = 6xy Nr = Ben] | 


4nN 


or for a non-spherical solvated particle 
f = 6rq Nr(fi/fo) (5) 


where (f/fo) is the frictional ratio which measures the departure of 
the molecule from the simple behavior, and is thus an indirect method 
of expressing the asymmetry and solvation of the molecule. Here 
r is taken as the radius in terms of a spherical molecule of molecular 
weight M and partial specific volume 4, » the viscosity of the solvent, 
and N, Avagadro’s number. The frictional resistance of a particle 
can be evaluated from the equation, 
M(1 —3 
pie Pee (5a) 
8 
where p is the density of the solution. For the sake of convenience, 
s is usually corrected to the value it would have in pure water at 20° C. 
by means of the following two equations: 


pw Ging 1h same (6a) 
89 = —— - — =———_ a 
0 gt e® OL009 


uLamm, O, Nova Acta Soc. Sci. Upsala IV 10, (6). 1937; Nature 132: 820. 1933. 
See reference 1 p. 253-273. 

122 Values of s obtained by the averaging of individual s values calculated from equation 
(3) can be shown to be determined only by z values for the initial and the final times, in 
just the same manner as that described in the note concerning sedimentation equilibrium.* 
Accordingly, we feel that s values obtained by measuring the slope of curves of log z vs. 
t (H20/n), where no!) represents the average viscosity correction during the interval from 
t = tot =t, represent more adequate estimates of the sedimentation constant. 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 127 


qt 1 rs 0.998272 


$20, w. = 820° mars 
ne 1 — ede 


(6b) 
where soo’ is the value reduced to the protein solution at 20° C., and 
S20, is the value reduced to water at 20° C. Here 7’, 0.01009 and 
m are the viscosities of the water at t° C., water at 20° C., and the 
solvent at.t° C., respectively; and zo and§5, the partial specific 
volume of the protein at temperatures of 20° C. and é° C., respectively. 

Values of the sedimentation constant s can be used to calculate 
the molecular weight M if the frictional ratio f/fy is known (or as- 
sumed), or to calculate f/fo if M is known (or assumed) : 


M = V162n3N0(f/fo)[s/(1 — de)? (7) 
(1 — de) (42M?) 
if = ——— | | g 
Ifo 6xys 30N? 8) 
In water at 20° C. these equations become: 
M = 2.45 X 10”%He¢/2[(f/fo) S20, v/(1 — 0.9982d20) |*/ (7a) 
fifo = 1.19 X 10-5 M?7(1 — 0.998220) /(s20, wD201/*) (8a) 


The sedimentation constants obtained from measurements on a 
number of proteins have been found to be a function of the protein 
concentration, and in order to obtain values suitable for introduction 
into the above equations, it is necessary to make a suitable extrapola- 
tion of the constants observed at finite concentrations. 

The accuracy of the extrapolations of observed values of s to values 
Sw, 20 has been tested in various ways. The consistency of values at 
various temperatures, and in the presence of such foreign materials as 
urea, alcohol, and heavy water seem to indicate that the viscosity part 
of Stokes’ law is adequate for most systems which, for the moment, 
we are likely to study! This will be further discussed in the next 
section on diffusion. The effect of the viscosity contribution of the 
protein is not so well understood, and may be the cause of a large 
part of the decrease in sedimentation constant with increasing protein 
concentration. From a theoretical standpoint most workers prefer to 
use the viscosity of the solvent (including any buffer salts, etc.) rather 
than of the solution, but certain data would bring this practice into 
question to some extent. Since the sedimentation constants should 
in any event be extrapolated to zero protein concentration, it might 
seem justifiable tentatively to use whichever viscosity leads to the 
most satisfactory extrapolation function. 


13 See, for example, Svedberg, T., & Eriksson-Quensel, I.-B. Nature 137: 400. 1936. 


128 ANNALS NEW YORK ACADEMY OF SCIENCES 


The addition of low concentrations of electrolytes has been shown 
to be necessary in order to eliminate electrical forces (Donnan effects) 
in all cases where proteins are transported. Calculations of Tiselius'* 
have shown that the addition of relatively small amounts of electro- 
lytes will repress the Donnan effect for most molecules. 0.1 or 0.2 M 
NaCl or KCl is frequently used at pH values near the isoelectric 
point, and is adequate for most purposes. 

Several other uncertainties in the evaluation of sedimentation con- 
stants have been discussed by Svedberg and Pedersen.1 Perhaps the 
most important of these is the effect of hydrostatic pressure in the 
cell upon the density of the solution and the partial specific volume of 
the protein. The change of viscosity with pressure is small enough 
to be almost negligible for aqueous solutions under the usual condi- 
tions. The boundary disturbances during the accelerating period, as 
well as those due to diffusion, are of some importance in certain cases, 
but in general cause no great uncertainty. 


DIFFUSION 
Measurements of the diffusion constant, D, can be used to evaluate 
the frictional resistance, f, even more directly than measurements of 
sedimentation velocity (equation 5), since 


f=RT/D (9) 


The molecular weight, M, can be calculated from the diffusion con- 
stant if we know f/fo, or the frictional ratio f/fo can be calculated if 
we know M: 


RT3 
= 162@N%!Dip (folf )* (10) 
fits = (=) (1) 
6xynND /\3iM 
In water at 20° C. these equations become: 
M = 2.42 X10-4/[Dao, u*d20 (f/fo)*] (10a) 
fifo = 2.89 X 10-5/[Das, (too M5] (11a) 


where Deo, » is the diffusion constant (cm?/sec.) corrected to water 
at 20° C. by means of the equation: 
293.2 Ne 
Deo, o2 D enya 
Pree Welk ea ie 120) w 
u“ Tiselius, A. Kolloid. Zeit. 59: 306. 1932. See reference 1, p. 23-8. 


(12) 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 129 


The measurement of the diffusion constant can be carried out either 
by “free diffusion” (Lamm & Polson)?® or by use of sintered glass 
discs (Northrop & Anson).1¢ Both methods are fairly widely used. 
The former method is an absolute measurement involving no primary 
standards, and is the most advantageous in most cases. The sintered 
glass discs, on the other hand, require a calibration of some sort to 
evaluate the pore size, and the selection of a primary standard is 
not an easy task. Also there is little evidence obtainable from the 
latter method in regard to the mono-disperse nature of the material 
and to the symmetry of the diffusion boundary, both of which are of 
considerable importance in the study of an unknown molecule. 


d 
In the free diffusion method we usually evaluate the quantity oe 
dx 


as a function of the distance z by means of a scale photograph or 
Schlieren method, and we may then evaluate the diffusion constant in 
several ways.® 17 The two most commonly used, and probably the 
most precise are (1) the measurement of the width of the peak 


d 1d 
at a point where r (or z) is equal to —- = (or Zmaa,/€), and (2) the 
xv e dx 


measurement of the maximum height of the peak and the area en- 
closed by the peak. The agreement between these methods is usually 
of the order of 3-5%. When dealing with monodisperse systems it is 
usually found that the curves representing the distribution of protein 
as a function of the distance agree well with ideal distribution curves, 
and this method has been used as an indication of the monodisperse 
nature of certain protein preparations. Deviations can likewise be 
used to evaluate the degree of polydispersity of a material. In certain 
cases it is observed that the diffusion curve is asymmetrical. The 
reasons for this behavior are not altogether clear, and cannot be ex- 
plained simply by the presence of several components. The calcula- 
tions of diffusion constants under these conditions are necessarily very 
uncertain. 

The adequacy of the viscosity and temperature corrections can best 
be demonstrated by several measurements of diffusion constants at 
temperatures between 25° C, and 5° C.1* When these two sets of data 
are corrected at 20° C., they are in excellent agreement. The viscosity 
effect has also been studied by variation of the salts and other 
molecules of the solvent. 


1sLamm, O., & Polson, A. G. Biochem. Jour. 30: 528. 1936. Lamm, O. Nova 
Acta Soc. Sci. Upsala IV 10: (6). 1937. 

16 Northrop, J. H., & Anson, M.L. Jour. Gen. Physiol. 12: 548. 1929. 

17 Williams, J. W., & Cady, L.C. Chem. Rev. 14:171. 1936. 

18 Polson, A. G. Kolloid. Zeit. 87: 149. 1939. Mehl, J. W. unpublished work, 


130 ANNALS NEW YORK ACADEMY OF SCIENCES 


SEDIMENTATION VELOCITY AND DIFFUSION 


The equations (7) and (8) for sedimentation velocity, and (10) and 
(11) for diffusion, involve both the molecular weight M and the fric- 
tional ratio f/fo. By their proper combination, however, we obtain 
two new equations which uniquely determine these quantities: 


M§= RTs/[D(1 — ie)] (13) 
1 /RT\28 /4n(1 — de)\8 
deapatey fri ae ees 14 
Us a (ea ( 30s ) oe 
In water at 20° C. these equations become: 
M = 2.44 X 10800, w/[Doo, w (1 — 0.9982%20)] _ (18a) 


Sifo =a x 10-8 {a — 0.9982%20) /Deo, w? 820, w V0) ]*/8 (14a) 


Although equation (13) is not thermodynamic in nature, it can be 
shown that at least in most cases the values of M so obtained are not 
affected to any large extent by hydration, and represent estimates just 
as reliable as those obtained from sedimentation equilibrium.’® 7° 
Values so obtained are independent of shape. The main difficulty en- 
countered is in the selection of values of s and D for infinite protein 
dilution, which should be used in these equations. Values at finite 
concentrations do not in general vary enormously from these infinite 
dilution values, but small errors can be introduced. If s and D 
vary with concentration in the same manner, so that the ratio s/D 
is constant, values of s and D taken under similar conditions may be 
used in equation (13). Values at infinite dilution must be obtained, 
however, before f/fo values can be obtained. 

Values of the frictional ratio f/fo greater than unity may be ex- 
plained on the basis of either the asymmetry or the hydration of the 
protein molecules, or, more probably, a combination of these two 
effects. Thus the observed frictional ratio f/fo can be broken into 
two factors: 


fifo = (f/fe) (felfo) 


where the first denotes the influence of hydration, and the second the 
influence of asymmetry. As a first approximation, necessary for the 
mathematical treatment of the problem, we may assume that protein 


1##Lamm,O. ark. Mat. Astrom., och. Fysik.21:1. 1929. Lansing, W. D., & Kraemer, 
E.O. Jour. Am, Chem, Soc. 58: 1471. 1936. Reference 1, pp. 20-23, 62-66. 

*Tt would appear that sedimentation velocity-diffusion estimates of molecular weights are 
even more reliable than the sedimentation equilibrium values for most of the molecules dis- 
cussed in this conference, because of the somewhat smaller probable errors in the determina- 
tion of diffusion and sedimentation constants in comparison with probable errors of the 
sedimentation equilibrium method. 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 131 


molecules can be treated by an ellipsoid of revolution. Using the 
equations derived by Perrin®* for the frictional ratio of ellipsoids of 
revolution, both elongated (prolate) and flattened (oblate) we may 
calculate values of f,/fo to be expected from various axial ratios, a/b 
of the ellipsoid. Here a is taken as the half-length of the ellipsoid 
along the geometric axis of revolution, and b is the equatorial radius. 


(Contour lines denote f/fo values) 


So 
= 
© A 
ae WuPAwvaA Fawr LZ ea 
a SYDLLAEL KAS a 
X a3] AZ 2252--485 

Z Cee a idl 


ANS 
ANS 
AW 
AW 
N 
Fe 
bial 
ala 
Ed 
fa 
iad! 


0. ao tt iat (a 

2 
gSeseseseuaasnanssene SURssaausesesesesnce 
0 o2 04 06 a8 L 24 26 28 30 


HYDRATION 
Grams of Water per Gram of Protein 


Ficure 1. Values of axial ratio and hydration in accord with various frictional ratios 
(contour lines denote f/fo values). 


The contribution due to hydration, f/f., can be calculated from the 
equation (see reference 1, p. 65) 


fif.s = (1 + wiie)* 


where w is the number of grams of water bound by one gram of protein. 
A figure can then be drawn, using hydration as ordinate and axial 
ratio as abscissa (on a logarithmic scale), in which various values of 
the frictional ratio are given by various contour lines. Such a diagram 

2 Perrin, F. Jour. Phys. Radium (7) 7:1, 1936. Hertzog, R. O., Dlig, R., & Kuder, 


H. (Zeit. physikal. Chem. A167: 329. 1933.) have obtained results which are identical 
with those of Perrin. See reference 1 pp. 40-42. 


132 ANNALS NEW YORK ACADEMY OF SCIENCES 


is given in ricurE 1.22 This can be used for the calculation of all 
amounts of hydration and asymmetry which might combine to yield 
a given frictional ratio. 

VISCOSITY 


Another method which is capable of giving data in regard to the 
shape and hydration of protein molecules is the study of viscosity. 
The basis of this method is somewhat less secure from both a theoretical 
and experimental point of view than the other methods already dis- 
cussed. The original derivation by Einstein of an equation relating 
the viscosity to the volume fraction, ®, of a suspension of rigid spheres 
whose size is large compared to the size of the solvent molecules, 


n/n —1=F® 


where F is equal to 2.50, has been modified by a number of more 
recent investigations, and can be applied to ellipsoidal shapes. HEqua- 
tions by Onsager,?* Guth,?4 Burgess,?° Peterlin,?® and Simha?’ for the 
case of ellipsoids, and of Kuhn?® and Huggins”? for long rods, are at 
present available to us, and modify the equation of Einstein in that 
they give values for the parameter F which are functions of the axial 
ratio. For molecules of low asymmetry, as is the case of nearly all 
the molecules under discussion here, the treatments of ellipsoids are 
more satisfactory, and the solution of Simha seems to be the most 
adequate. 

The volume fraction, &, can be expressed in terms of the protein con- 
centration, c, (grams dry protein per liter of solution), the partial 
specific volume of the anhydrous proteins molecules, 6, the number of 
grams of solvent bound by one gram of dry protein, w, and the 
density of the solvent, p. We may thus rewrite the above equation 


n/qo — 1 = Foc [1 +fw/(5e)]/1,000. 


Using this equation and values of F obtained from Simha’s equation,*®° 
we may now draw another diagram similar to ricur® 1, but with con- 


22 We have set 1/(9p) equal to 1.34 for this calculation, a value which can be used for 
most proteins. 

23 Onsager, L. Phys. Rev.-40: 1029. 1932. 

Guth, E. Kolloid. Zeit. 74: 147. 1936. 

2s Burgess, J. M. ‘Second Report on Viscosity and Plasticity." Amsterdam. 1938. 
Chapter 3 (For cylinders). 

2¢Peterlin, A. Zeit. physik. 3: 232. 1938; Kolloid. Zeit. 86: 230. 1939 (Ellipsoids). 

27Simha, R. Jour. Phys, Chem, 44:25, 1940. 

28Kuhn, W. Zeit. physikal. Chem. A161: 1. 1932. 

2° Huggins, M.L. Jour. Phys. Chem, 42: 911. 1938; 43: 439, 1939. 

30 Mehl, J. W., Oncley, J. L., & Simha, R. Science 92: 132. 1940 give tables of F 
(vy in their notation) as a function of the axial ratio a/b (1/p in their notation) for both 
elongated (oblate) and flattened (prolate) ellipsoids of revolution. 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 133 


HN 

HAL 
(AG 
aul 
El 


7 


Be 
AG 


ANS 
WW 
rN) 


a ————— a, 
TT CT 


ies en rae aed 
CECE 
16 18 20 22 24 26 


~ HYDRATION 
Grams of Water per Gram of Protein 


Ficure 2, Values of axial ratio and hydration in accord with various viscosity coef- 
ficients (contour lines denote (n/no — 1) 1000/vc values). 


tour lines for various values of (n/70 — 1) 1,000/%c. Such a diagram 
is shown in FIGURE 2. 

In general it is found that the specific viscosity, n/no — 1, is a linear 
function of the concentration only at very low concentrations (less 
than 1% for most of the molecules under discussion). The available 
data is best treated by an extrapolation of (n/m — 1)/c vs. concen- 
tration curves to infinite dilution.2! The values so obtained in various 
investigations are not in very good agreement, and show the need for 

2 Polson, A. G. Nature 187: 740. 1936; Kolloid Zeit. 88: 51. 1939 has used another 
method for finding the limiting value of (n/n0 — 1)/c. If we used the empirical equation 
of Arrhenius, 7/n0 = 2°, we see that'd(a/m)/de = ac In a, and the limit of this quantity at 
infinite dilution (c = 0) isIn«. Thus it is only necessary to evaluate In « from the ob- 


served data, and extrapolate this quantity to low concentration if any concentration de- 
pendence is observed. This quantity In a can easily be calculated from the equation: 


1/e ln n/n =Ina@ 


This method should be reliable if the results so obtained are extrapolated to zero concen- 
trations. If used for finite concentrations, it involves the assumption that the Arrhenius 
equation is correct. When extrapolated to zero concentrations, the values obtained by 
this method agree with those obtained by the extrapolation of the quantity (n/no — 1)/c. 


134 ANNALS NEW YORK ACADEMY OF SCIENCES 


more precise measurements of the viscosities of very dilute protein 
solutions. 

The problem of obtaining more accurate viscosity data in dilute 
protein solutions is a serious one. The effect of surface denaturation in 
solutions of dilute proteins is well known and may be responsible for 
some of the lack of agreement between workers. The effect of pH 
on electrolyte-free protein solution is of great importance, and probably 
the best procedure for viscosity measurements is to conduct them in 
the presence of about 0.2N salt. Polson has shown®? that even large 
pH changes fail to introduce errors in viscosity under these conditions. 

The effect of the velocity gradient used in determining the viscosity 
is another possible source of error, particularly when dealing with 
highly asymmetric molecules. It can easily be shown that the 
gradients usually used in capillary viscometers, in the study of the 
proteins being discussed here are much too small to be of importance 
here, however.®? 


DIELECTRIC DISPERSION 


The study of dispersion of the dielectric constant is capable of 
giving us data in regard to the size and shape of molecules in addition 
to information concerning the dipole moment. This problem was dis- 
cussed during the recent dielectric symposium of the Academy,*4 
particularly from the point of view of the dipole moments of protein 
molecules. For the present discussion, it is probably best to think of 
these data in terms of a rotary diffusion constant, @, analogous to the 
linear diffusion constant, D. The experimental data required are 
dielectric constants measured for suitable solutions of low conductivity 
over a fairly wide frequency range. By applying the equations of 
Debye, as modified by recent treatments of Wyman, Onsager, Perrin 
and others, we can analyze measurements of dispersion of the dielectric 
constant into various critical frequencies, v,, relaxation times, r(r = 
1/2zv,) or rotary diffusion constants, @(®—=7»,). Having these 
critical frequencies, we may apply Perrin’s equations and calculate 
values of a hypothetical critical frequency, vo, for a spherical molecule 
of equal volume, from which we can evaluate the size of the molecule, 
and the axial ratio a/b, of an assumed ellipsoidal shape which may be 
either a flattened or an elongated ellipsoid of revolution. 

2 Polson, A. G. Kolloid. Zeit. 88:51, 1939. 

38 For a discussion of this error see Kroepelin, J. Kolloid. Zeit, 47: 300. 1929. It has 
been briefly discussed by Edsall, J. T., & Mehl, J. W. Jour. Biol. Chem. 133: 418. 1940. 


See also Polson.*1 
34 Oncley, J. L., Ferry, J. D., & Shack, J. Ann. N. Y. Acad. Sci. 40: 371. 1940. 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 135 


The method briefly reviewed above has been applied to a considerable 
number of the proteins under discussion here. In the case of egg 
albumin, serum albumin, edestin and possible lactoglobulin, it was 
found that two critical frequencies were required to reproduce the 
data obtained. In such cases, a comparison of experimental results 
with calculations from Perrin’s equation give us values for vo, ® and 
a/b if we assume that the molecule can be represented by an ellipsoid 
of revolution. Here © is the “dipole angle” of the molecule, defined 
as the angle between the a axis of the assumed ellipsoid and the dipole 
moment vector. The resulting size of the molecule is calculated from 
yo (or 7) and gives us a molecular volume, V, for the protein as it 
exists in the solution (including whatever water of hydration—w grams 
per gram of dry protein—may be considered rigidly bound to the 
molecule. This molecular volume can then be compared with Mj and 
the amount of hydration computed: 


V = RT/[6xqv0 = RT 10/8q = Mo (1 + w/e) 


In the case of hemoglobin, lactoglobulin, and insulin, we find that the 
data can be well represented by a single relaxation time. This can be 
explained either by the assumption 1) of a spherical molecule (in 
which case a/b = 1 and vp = vo); 2) that the other critical frequency 
is associated with a dipole moment of such a small magnitude that it is 
not observed, or 3) that the ellipsoid is of flattened shape. In any 
of these circumstances the observed critical frequency, v., may be 
either y,, the critical frequency associated with the a axis, OF vp, asso- 
ciated with the b axis. It is then necessary to calculate all possible 
values of vo, ® and a/b, and then evaluate all possible combinations 
of axial ratio a/b and hydration w which could be used to explain the 
observed dielectric dispersion curves. 

The simple picture presented in the above paragraph is unfortunately 
only a first approximation. Although two critical frequencies can be 
chosen to represent the large portion of the dielectric constant versus 
log frequency curves, the agreement between calculated and observed 
curves is not as good as could be desired. This deviation is of a form 
to cause more of a smoothing in the case of the experimental curves 
than would be calculated. The choice of even a third critical fre- 
quency does not improve the agreement to any appreciable extent. A 
more likely explanation of this effect, however, can be suggested. 

In a solution of protein at its isoelectric point we would expect a 
large number of dipole species of identical size and shape, but of dif- 
ferent electrical symmetry. A number of such species will depend 


136 ANNALS NEW YORK ACADEMY OF SCIENCES 


upon the number of dissociating groups and their dissociation con- 
stants.25 If we express the electrical symmetry in terms of electric 
moments pz and p, along the two axes of an ellipsoid of revolution 
we see that each species may have different values of these quantities, 
depending upon the distribution of charged groups on the protein 
molecule. For a spherical molecule this leads to no additional com- 
plications, and an average moment is obtained. For an asymmetric 
molecule, on the other hand, each species should make a contribution 
to each of the separate dispersions, and the observed curve would be an 
average of many curves of varying shapes, even though v, and v 
remained constant, and would thus have lost much of the sharpness 
of the individual curves. Zein seems to be a case in which the shape 
can be rather simply calculated, since it contains so few dissociating 
groups that at the isoelectric point there is probably a fairly simple 
behavior.2* Edestin, however, is perhaps a case where the data must 
be treated in the more complex manner discussed here. Horse serum 
pseudoglobulin-y may be another case of this type. 

The measurements of dielectric constant at frequencies above the 
dispersion region can also be used to calculate the hydration of pro- 
teins.27 The dielectric constant obtained under these conditions is due 
almost entirely to solvent dipoles, and if we take the difference, Ac,, 
between the dielectric constant so measured, ¢., and that of the pure 
solvent, e°, we may use it to evaluate the amount of water associated 
with each gram of anhydrous protein. Only a few measurements of 
this type are available. 


DOUBLE REFRACTION OF FLOW 


The measurement of the degree of orientation of molecules as 
measured by the double refraction of the solution caused by velocity 
gradients in the fluid has been attempted for many systems, and the 
data thus obtained can be used in the evaluation of the molecular 
asymmetry, by means of the rotary diffusion constant, @.38 Such 
methods have been proven to be of great value in the study of some of 
the larger protein molecules, such as myosin and tobacco mosaic virus, 


36 Edsall, J. T., & Blanchard, M.H. Jour. Am. Chem. Soc. 55: 2337. 1933. Green- 
stein, J. P, Jour. Biol. Chem. 93: 479. 1931. Linderstrgém-Lang, K. Compt. rend. 
trav. lab. Carlsberg 15 (7): 1924. See also Shack, J. Ph.D. Dissertation, ‘‘ Dielectric 
Absorption of Protein Solutions,’’ Harvard University 1939. 

36 Elliott, M. A., & Williams, J. W. Jour. Am. Chem. Soc. 61: 718. 1939. See also 
Wyman, J. Jour. Biol. Chem. 90: 443. 1931; and Watson, C. C., Arrhenius, 8., & 
Williams, J. W. Nature 137: 322. 1936. 

37 Oncley, J. L. Jour. Am. Chem. Soc. 60: 1122. 1938, note 43. 


vw 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 137 


and have been adequately discussed elsewhere.*° Recently there are 
available measurements on one of the molecules under discussion here: 
serum albumin solutions of various solubility have been investigated 
in glycerol-water mixtures by Sadron,*° and he concludes that serum 
albumin fractions of low solubility are fairly symmetrical. The case 
of the more soluble albumins was complicated by the presence of 
molecules of varying amounts (and sign) of birefringence. Measure- 
ments on egg albumin by Boehm and Signer*? showed no large degree 
of asymmetry. The application of this method to other proteins is 
being undertaken in our laboratory by Edsall and Mehl, and it is 
hoped that asymmetries can be obtained from this method which may 
be compared with those deduced on the same preparations by the 
other methods that have been discussed. 


PARTIAL SPECIFIC VOLUME 


The partial specific volume of the protein is involved in nearly all 
of the experiments which have been considered, has been used directly 
by Adair and others to measure the hydration of proteins, and is used 
in certain of the X-ray calculations of molecular weight discussed in 
another review of this conference. This quantity is evaluated from 
measurements of density of solutions of the protein under investiga- 
tion.42 It has been pointed out in the introduction to this conference 
that its value varies somewhat for various proteins, but is directly 
related to the amino acid composition of the protein** Many proteins 
have values between 0.745 and 0.750, but there are many cases falling 
outside this range, and the occasional practice of using specific volumes 
of related proteins, rather than carrying out the necessary density 
measurements, should be discontinued. This is especially true when 
it occurs in the function (1— dp), since errors in 6 are magnified 
approximately fourfold. 

Tt has been pointed out that molecular weights as estimated by 
sedimentation-equilibrium or osmotic pressure are thermodynamic 
methods which are unaffected by any hydration of the protein. Like- 


318 The theoretical side of this problem has been treated by Boeder, P., Zeit. Phys. 15: 
258. 1932. Kuhn, W. Kolloid. Zeit, 62: 269. 1933. 

39 von Muralt, A. L., & Edsall, J. T. Jour. Biol. Chem, 89: 315. 1930. Mehl, J. W. 
Cold Spring Harbor Symp. Quant. Biol. 6: 218. 1938. Lauffer, M.A., & Stanley, W. M. 
Jour. Biol. Chem. 123: 507. 1938; Jour. Phys, Chem, 42: 935. 1938. 

«0 Sadron, C., Bonot, A., & Mosimann, H. Jour. chim. phys. 36: 78. 1939. 

« Boehm, G., & Signer, R. Helv. chim. acta 14: 1370. 1931. 

« Fankuchen, I. Ann. N. Y. Acad. Sci., this number, 

«3 See Kraemer’s section of Reference 1, pp. 57-62, for details of this measurement. 

4@ Cohn, BE, J. Ann. Rev. Biochem. 4: 93. 1935; Ann. N. Y, Acad. Sci., this number. 


138 ANNALS NEW YORK ACADEMY OF SCIENCES 


wise the sedimentation-diffusion method gives molecular weights cer- 
tainly very little affected by even very large amounts of hydration. 
This can best be explained by the fact that the factors M,(1 — dap) 
and M,.)(1— dp) are equal, where the subscripts a refer to properties 
of the anhydrous molecules and the subscripts w to those of the 
hydrated molecules. This problem has been discussed at length by 
Lansing and Kraemer,*® who also derive equations to show that these 
same conditions hold at least approximately for three-component sys- 
tems with solvation. 

Adair and others*® have attempted to study the amount of hydra- 
tion by the measurement of protein crystal densities. They calculate 
the number of grams of water associated with one gram of dry protein, 
w, from the formula 

Do = (. a3 do)/(1/e a de). 


where i, is the specific volume of the protein crystals, 0, is the specific 
volume of anhydrous protein, and p is the density of water. They 
measured 3, by immersion in solutions of suitable density to just sus- 
pend the protein crystals. These solutions consisted of such materials 
as sucrose, ammonium sulfate, mono-sodium phosphate and potassium 
citrate in aqueous solution. The objections to these methods are (1) 
that they assume that the crystals consist of two components; A, com- 
posed of anhydrous protein and water, and B, composed of dispersion 
medium; (2) that the environment of the crystals is so changed in 
the process of obtaining a suspending media of proper density that the 
densities so obtained are no longer those of the protein crystals under 
normal conditions, and (3) that the “water of hydration” may not 
have the same density as water in its usual state. (This third objec- 
tion is equally true of all the methods described for the calculation of 
hydration from values of molecular volumes.) 


MOLECULAR WEIGHT 


It remains to attempt an evaluation of the data available for the 
proteins chosen for discussion in this conference. The measurements 
to be considered are compiled in TaBLE 1, and the accompanying notes. 
X-ray estimates of molecular weight are considered by Fankuchen.*? 
The case of each of the molecules can perhaps be considered indi- 
vidually. 

Lansing, W. D., & Kraemer, E. O. Jour. Am. Chem. Soc. 58: 1471. 1936. See 
also reference 1, pp. 62-66. 


«s Adair, G. S., & Adair, M. E. Proc. Roy. Soc. B120: 422. 1936. See also Adair, 
G. 8. Proc. Roy. Soc. B127: 18. 1939. 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 139 


Insulin 


All the available ultracentrifuge data on insulin were obtained 
prior to the recent revision of sedimentation equilibrium values. The 
value of 35,100 by sedimentation equilibrium may therefore be changed. 
The diffusion constant has been measured by the latest methods, and 
the sedimentation velocity constants have not appreciably changed 
since 1931. The value of 41,000 may thus tentatively be taken as the 
more likely of the two given here. 


Lactoglobulin 


The available data on lactoglobulin are all of quite recent origin. 
A value of 40,000 - 2,000 is in good agreement with both of these 
results. 


Chymotrypsin and chymotrypsinogen 


These proteins have been crystallized by Northrop and his co- 
workers, who have estimated molecular weight by osmotic pressure 
methods. The ultracentrifugal molecular weights of these substances 
are being studied by J. W. Williams and his co-workers, and additional 
data will soon be available, for their more accurate evaluation. 


Pepsin 


The values for pepsin are of recent origin, unpublished to a large 
extent, and all fall within a value of 37,000 + 2,000. 


Egg Albumin 


Choice of a value for the molecular weight of egg albumin is at 
present somewhat difficult. Both the sedimentation velocity and dif- 
fusion constants seem quite well established, and lead to a molecular 
weight close to 44,000, somewhat higher than that given by the equi- 
librium ultracentrifuge. Sérensen in 1917 estimated the molecular 
weight of this protein to be 34,000 from osmotic pressure measure- 
ments. Adair‘? in speaking of the osmotic pressure results refers to 
“s recalculation based on Sérensen’s data made by Adair which gave 
the value 43,000. The same figure was obtained by Marrack and 
Hewitt from measurements of osmotic pressure. Taylor, Adair and 
Adair using the same method obtained the value 46,000 for material 
which had been recrystallized six times.” 


«7 Adair, G.S. Ann. Rev. Biochem, 6: 180. 1937. 


140 ANNALS NEW YORK ACADEMY OF SCIENCES 


Serum Albumin (Horse) 


All of the recent data on this protein seem in good agreement with 
the figure 70,000 + 3,000. The new fractionation of serum albumin 
by McMeekin‘’§ reopens this problem to a certain extent. 


Hemoglobin (Horse) 


We may take 66,700, the value as obtained from chemical analysis, 
for the molecular weight of hemoglobin. The value from sedimenta- 
tion and diffusion is about 5% lower than this figure, but is complicated 
by the fact that extrapolations are difficult to make because dissocia- 
tion seems to occur at concentrations below 0.8 per cent, and that the 
optical aanlysis of hemoglobin solutions by the refractive index 
methods are somewhat less precise than is usually the case because 
the photographic work must be done in the red or yellow portion 
of the spectrum. 

Edestin 


Until further details are published, we probably should accept the 
value of 310,000 for the molecular weight of this protein. A new 
determination by sedimentation equilibrium would be of considerable 
value in confirming the newer result from sedimentation velocity and 
diffusion. 

Excelsin 


In view of the difficulties involved in the osmotic pressure measure- 
ments in the case of such a large molecule the value from sedimentation 
and diffusion must be considered the most accurate that we possess, 
though it is to be hoped that it, like edestin, will again be studied in 
the equilibrium ultracentrifuge. 


Bushy Stunt Virus 


The molecular weight of this material is best taken as 7,600,000, 
on the basis of sedimentation equilibrium measurements. It would 
be interesting to have a direct measurement of the diffusion constant 
of this virus. 


ASYMMETRY AND HYDRATION 


The calculation of asymmetry by most of the methods that we have 
discussed is complicated by the effect of hydration. In order to include 
this effect, we have attempted to present the data on asymmetry and 
hydration in graphic form, the same coordinates being adopted as 


4s McMeekin, T. L. Jour. Am. Chem. Soc. 61: 2884. 1939; 62: 3393. 1940. 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 141 


Serum Albumin Edestin Lactoglobulin 


SES EERE 0 5 


S 


Bee 


Hemoglobin _ Insulin 


AXIAL RATIO, a/b. 


NUD 


A 


0.5 
HYDRATION 


Grams of Water per Gram of Protein 


From frictional ratio From viscosity coefficient 
(assuming «4% error) (assuming =10% error) 
SQ] From X-ray studies 


From crystal density 
ae (assuming +25% error) S 
YA From dielectric dispersion, Wj From dielectric dispersion 
Y with one relaxation time N with two relaxation times 


Fiaure 3. Asymmetry and hydration of certain protein molecules. 


were previously employed in ricuREs 1 and 2, to present the data from 
diffusion, viscosity, dielectric dispersion, crystal density and X-ray 
diffraction studies (ricuRE 3). The diffusion and viscosity results were 
plotted by taking the values for f/fo and (n/7— 1)1000/0C recorded 


142 ANNALS NEW YORK ACADEMY OF SCIENCES 


in TABLE 1, and arbitrarily choosing probable experimental errors of 
++ 4 per cent for f/fo and + 10 per cent for (n/m — 1) — 1000/0C. 
Results from dielectric measurements were calculated from the relaxa- 
tion times recorded in TABLE 1, with as reasonable estimates of probable 
error as possible. The results from crystal density measurements are 
arbitrarily assumed to be + 25%, and are taken from Adair and 
Adair.4¢ The sources of X-ray data, and their reliability, are discussed 
by Fankuchen.*” # By a study of the diagrams of ricurE 3, we can 
best see just how well the various methods seem to agree. 


Insulin 


Studies of diffusion-sedimentation velocity and X-ray diffraction 
have been made.®® The X-ray results seem to strongly favor a flat- 
tened ellipsoidal model, and the values of a/b = 0.6, w = 0.2, seem 
in good agreement with both methods. 


Lactoglobulin 


Results from diffusion-sedimentation velocity, viscosity, dielectric 
dispersion, and X-ray diffraction, are available. The method of di- 
electric dispersion seems to show a slight deviation from a single re- 
laxation time curve, and the values a/b = 4, and w = 0.3 would appear 
most likely from this work; these values would agree with diffusion 
and viscosity data but are in poor agreement with X-ray results. 
If the small deviation suggested by the dielectric dispersion is neg- 
lected, then values of a/b ca. 1, and w= 0.6, taken from the X-ray 
results, would be in agreement with the single observed relaxation time, 
as well as all other data, but there is no data to support such large 
estimates of hydration. 


4° We wish to acknowledge the cooperation of Dr. I. Fankuchen in the preparation of 
the X-ray estimates in riaurE 3, and the privilege of seeing the manuscript of several 
unpublished papers of Drs. Dorothy Crowfoot and Dennis P. Riley, and the Ph.D. Dis- 
sertation (University of Cambridge, 1939) of Dr. M. F. Perutz, entitled ‘‘The Crystal 
Structure of Horse Methaemoglobin.’' The values for asymmetry as calculated from 
X-ray measurements depend upon 1) the distribution of the molecular centers; 2) the 
assumption that the molecules closely approximate ellipsoids of revolution; and 3) the 
arrangement chosen for the packing of these ellipsoids. The estimation of the distribu- 
tion of the molecular centers (1) depends upon several factors: for crystals containing a 
small number of molecules per unit cell (insulin and hemoglobin) this distribution of 
molecular centers is fixed by the symmetry of the crystal, whereas for crystals containing 
large numbers of molecules per unit cell (such as lactoglobulin) this distribution can be 
determined only by a complicated study of the X-ray intensities. The choice of arrange- 
ment (3) is influenced by such guides as symmetry considerations, the character of peaks 
in Fourier projections, etc., and may be subject to considerable error. 

‘© Dielectric dispersion results in propylene glycol, the only solute in which insulin has 
thus far been studied by this method, cannot be explained on the basis of a molecular 
weight of 40,000 (See appendix to Table I). 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 143 


Pepsin 
_ The data from diffusion-sedimentation velocity and from viscosity 
are in poor agreement, and the interpretation of X-ray results is un- 
certain and has not been considered. No most probable values can 
be suggested from the available data. 


Egg Albumin 


Diffusion-sedimentation, viscosity, dielectric dispersion, and crystal 
density studies are available for egg albumin. The values a/b = 4 
and w= 0.2 would seem to agree quite well with all of the available 
data. 

Serum Albumin (Horse) 


Results from diffusion-sedimentation, viscosity, dielectric dispersion, 
and crystal density have been considered. The values a/b = 5, and 
w == 0.2 agree well with these data, but it must be remembered that 
these various measurements have been made on serum albumin frac- 
tions that were prepared by several methods and may have had 
varying axial ratios and hydrations. 


Hemoglobin (Horse) 


This protein has been studied by all of the available methods, and 
values of a/b ca. 1.6, w = 0.3 seem in good agreement with these data. 
There is some uncertainty in the f/fo value, however, as discussed in 
the appendix of TABLE 1. 

Edestin 


Measurements of crystal density, dielectric dispersion and diffusion- 
sedimentation velocity are available, and are not in very good agree- 
ment. The f/fo value chosen may be too low, and the value of a/b 
from dielectric dispersion might be too large. X-ray studies are dif- 
ficult to interpret and have been omitted from consideration. An 
elongated molecule with w about 0.1 and a/b between 4 and 8 would 
appear most probable. 


Chymotrypsin, Chymotrypsinogen, Excelsin, and Bushy-Stunt Virus 


‘These molecules have been studied only by diffusion (f/fo) and 
thus cannot be critically examined." The axial ratio vs. hydration 
diagrams can be obtained from FricurE 1, and are not redrawn. 


81 Some X-ray data do exist on chymotrypsin, but their interpretation in terms of 
asymmetry is uncertain. 


144 ANNALS NEW YORK ACADEMY OF SCIENCES 


ACKNOWLEDGMENT 


I should like to express my deepest thanks to the many persons who 
have been of incalculable aid in the preparation of this review. To 
Professor J. W. Williams and to Drs. H. P. Lundgren and W. B. Bridg- 
man, with whom I had the opportunity of discussing considerable 
portions of an early draft of this paper, I am especially grateful. Mr. 
E. E. Hess and Professor Williams were kind enough to furnish un- 
published data on chymotrypsin, as was Professor J. Wyman, and 
Dr. E. G. Pickels on egg albumin. Professor J. T. Edsall and Drs. J. 
D. Ferry and J. W. Mehl have been of assistance in the preparation of 
many parts of this review, and have contributed unpublished data 
and calculations. And lastly, the aid of Professor E. J. Cohn has been 
indispensable throughout. 


NOTES FOR TABLE 1 
Sedimentation Equilibrium 


InsvuLIN.—The average of three values from Sjégren and Sved- 
berg (49). Svedberg and Pedersen state (57) that these results 
give “. . . probably too low a value, since not sufficient time was 
allowed for the equilibrium to be established.” 

LaAcTOGLOBULIN.—The average of about sixteen experiments with 
extremes from 36,000 to 42,500 from Pedersen (38). 

Prpsin.—From Eriksson-Quensel (13). Philpot and Eriksson- 
Quensel (41) gave 33,500 as the mean of two experiments. 

Eee ALBUMIN.—From Pedersen (39). Svedberg and Pedersen (57) 
state, “When the light-adsorption method was used, values from 40,000 
to 46,000 were generally found, but the agreement between the dif- 
ferent experiments was not very good. With the refractive index 
method the single experiments agreed better, and as a mean value, 
M, = 40,500 was found.” This value replaces a former value of about 
34,500 obtained from work of Svedberg and Nichols (55), Nichols (32), 
Sjogren and Svedberg (48) and Svedberg and Eriksson (53), since, 
“It seems . . . as if the duration of their experiments have been too 
short to secure real sedimentation equilibrium, so that the molecular 
weight calculated by them is too small” (57). 

SzruM ALBUMIN.—From Pedersen (39). This value checks well 
with an older value of 68,150 + 2,000 obtained by Svedberg and 
Sjogren (58). 

HeMmocLosin.—From Pedersen (39). 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 145 


*s0}0U SupAUeduI0INe O08 4x 
(CT WIava 03 8090 008) "L°ZI = nOT X $ SBA O1OY popioded oq prRoys ORM gynoejour sty} 
‘9194 PEPJOII SETMIG[OUI JoY30 oY} 0} WOPTpPpe Ut Uy[Nqo]s poss oo0"qQo} UO epEU Weeq EAL S}USUIOINSvOU AVI-X 
60°T ett ZT ott Lo°1 ort 80°T 02°T oe'T 9Z°T st't os/f ‘ogey [TeUOMOLT 
(94)/o00T (t — 4/4) 


uo quUOMIOINSvOU ATUO OUT 


Ig 3°9 gs Zs 0'9 quoppe0D A7Ts00sTA 
8% :09% ¥'S |G L:98 LF :8t 9°9 ‘OT Zt 201 X t+ ‘soupy, TOMexeley 
68L°0 €¥L'0 | ¥¥L°0 | 6FL'O | 8F2°0 6FL°0 (092° 0) IgL'0 | 6%2°0 4 ‘gumjoa oppeds [epied 
201 X * ‘oC 
9s + | &6°E 6°9 9 BL 0°6 6°L VL eL 2°38 queqsuog uoSNyid 
s10T X % ‘ots 
“OPT eel] 8st) Fh OFF go 2° Te ots ge 43U24SUOD UOT} e}UeTITpEg 
000‘96z] 000‘0Ts| 000'E9 | 0000 000't* 00s‘ge | 000'8 00s‘TF | 000'TF TOPNpIC-woyeyueuTpeg 
000'F1Z 000'z9 | 000'2 |000'9%-000'F£] 000'Se | 000'9E | 000'TF einsseig I}OUISO 
000'009"2 ** xe | 000°89 | 000'89 0030+ 000‘6E 000'82 | 000'ss esnjyUeD wNGTMbA 
4U3TOM JemMoe[OVL 
snItA . uyor 030 
up ayqoTs | | upandiy iF upsd.414 jUyNqo[s 
Siete -[00xG uys0pa -ouley Bere 234 uypsdeg aya -ourkyg| -019eT uynsuy 


,STTNOWIOW NIGLOug NIVIWAD 40 GavHg ANV BZIg 
T w1av,L 


146. . ANNALS NEW YORK ACADEMY OF SCIENCES ° 


Epestin.—A value of 208,000 was given by Svedberg and Stamm 
(60), but is no longer given in tables published from Upsala. Sved- 
berg and Pedersen (57) state that “. .. the duration of these runs 
was too short to secure equilibrium. .-.” 

Excetsin.—A value of 212,000 was given by Svedberg and Sjogren 
(59), but is subject to the same remarks just made concerning edestin. 

Busuy-Stunt Virus.——From McFarlane and Kekwick (26). It 
represents an average from three experiments which vary between 
extremes of 7,100,000 and 8,100,000. 


Osmotic Pressure 


CuyMmotryPsin.—From Kunitz and Northrop (22). 
CHYMOTRYPSINOGEN.—From Kunitz and Northrop (22). 
Preprsin.—From Northrop (33). 

Eee ALBuMIN.—Sorensen (50) gave 34,000 in aqueous solution, a 
value which has been revised by Adair (3) to give 43,000. Burk and 
Greenberg (9) gave 36,000 in aqueous solution and 36,000 in urea 
solution. Marrack and Hewitt (24) gave 43,000 in aqueous solution, 
and Taylor, Adair and Adair (61) gave 46,000. 

Srrum ALBUMIN,—Sorensen (51) gave 45,000 which Burk (7) showed 
was due to the deviation from an ideal solution. When corrected for 
this deviation, Burk gave 76,000 from these measurements, Adair (1) 
gave 62,000, Adair and Robinson (4) gave 72,000. Burk (7) found 
75,000 and a similar value (73,000) for denatured material. Roche, 
Dourier and Marquet (46) gave 69,000 for human serum albumin. 

Hemociopin.—Adair (2) gave 67,000 in aqueous solution. Burk 
and Greenberg (9) gave 66,500 in 6.5 M glycerol solution. Values 
obtained in urea are approximately one-half of these values. 

ExcELsIn.—From Burk (8). 


Sedimentation Velocity and Diffusion 


InsuLin.—d and s from Sjogren and Svedberg (49). s is the average 
of five experiments with extremes of 3.41 and 3.55. D from Polson 
(44), and represents the average of three experiments. 

LACTOGLOBULIN.—d and s from Pedersen (38). D from Lamm and 
Polson (23), Polson (44), and values of Polson given by Pedersen (38). 
Both s and D are functions of pH, but given constant molecular 
weights (s/D = constant) from pH 3 to 9.5. At constant pH the 
measurements of s vary about + 3%, and of D +1%. 

CuymorrypPsin.—D from Kunitz and Northrop (22), and was ob- 
tained by the porous disc method. The value reported here was 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 147 


obtained by multiplying their corrected values by 1.11, a factor derived 
by ® careful examination of the porous disc results of McBain (25), 
Mehl (28), Polson (43) and Anson and Northrop (5). (We are in- 
debted to Dr. J. W. Mehl for aid in this examination.) 

CuymortryPsinocen.—D and s are preliminary results of Hess and 
Williams (19). % is assumed to be 0.75. 7 he 
. Pepstin.—D, s and 6 are largely from work of Polson (44) ‘and 
Eriksson-Quensel (13). Philpot and Eriksson-Quensel (41) . gave 
s = 3.3 (mean of seven determinations with estimated standard devia- 
tion of 0.15). Herzog ( 18) gave D = 9.19 in 1907... 

Eco ALBUMIN.—D from Lamm and Polson (23) and Polson (44). 
Also in good agreement with the value 7.7 given by Tiselius and 
Gross (62). 4 from Nichols (82) and Svedberg and Nichols (55). 
s from Pedersen (39). Sjégren and Svedberg (48) gave s = 3.54, 
Svedberg and Eriksson (53) gave 3.4, Williams and Watson (63) gave 
3.6 and Pickels (42) found 3.5. Earlier measurements of Svedberg 
and Nichols (55) and Nichols (32) gave 3.31, and indicated difficulties 
in the proper evaluation of the molecular weight by sedimentation 
velocity and diffusion data. Herzog (18) gave 7.54 for D in 1907. 

Serum ALBuMIN.—D from Polson (44), the measurements varying 
by + 20%. Also from Kekwick (20). A value of 6.45 was given by 
Lamm and Polson (23). s from Mutzenbecher (30) and from Kek- 
wick (20). @ is from Svedberg and Sjégren (58) who found s = 4.4 
and D=5.8 (measured in ultracentrifuge by old technique). Two 
different. serum albumin fractions of McMeekin (27) gave s= 4.1. 

‘HemocLopin.—? from Svedberg and Fahraeus (54). s from Peder- 
sen and Andersson (40). A value of 4.63 was given by Steinhardt 
(52). Svedberg and Nichols (56) gave s= 4.5. Polson (44) and 
Lamm and Polson (23) gave D = 6.9 from measurements of human 
hemoglobin, and values for that of the horse a few per cent higher. 
The value 6.3 (57) was obtained by Tiselius and Gross (62), but would 
appear from these newer results to be too small. We have used 
D =649 in our calculations. A value of 6.0 was also given by Sved- 
berg and Nichols (56) as measured in the ultracentrifuge. 

Epestin.—s and i from Svedberg and Stamm (60). Preliminary 
measurements by Oncley (35) give a somewhat larger value of s = 14.6. 
D is from Polson (44). Using the old method of measurement in the 
ultracentrifuge, Svedberg and Stamm (60) gave D=5.6. 

Exceisin.—D from Polson (44). s is the revised value (57) ob- 
tained from the experiments of Svedberg and Sjégren (59), who re- 
ported 11.8 as the average of nine experiments. 3 was obtained by 
these workers. 


148 ANNALS NEW YORK ACADEMY OF SCIENCES 


' Busuy-Strunt Virus.—s and 0 from McFarlane and Kekwick (26). 
3 was the mean of two values (0.743 and 0.723), and s of eleven values 
varying from 138 to 158. 

Tosacco SEED GiopuLIn.—A single value of 12.7 for the sedimenta- 
tion velocity constant of tobacco seed globulin was measured by 
Philpot and published by Crowfoot and Fankuchen (12). They as- 
sumed that this molecule was similar to edestin, excelsin, and amandin, 
and took an approximate molecular weight of 300,000. 


Relaxation Times from Dielectric Dispersion 


' Insutin.—From Cohn, Ferry, Livingood and Blanchard (11). This 
value was obtained in propylene glycol solutions with from 0 to 20% 
water present, and was corrected to values for water at 25°. The 
values obtained with the various water contents all gave the same 
value for the corrected relaxation time (+ 10%) in spite of a 100% 
change in viscosity. The value obtained for the relaxation time is 
too small to be explained on the basis of a molecular weight of 
35,000-40,000, however, and further work is in progress on this 
problem. 

LAcTOGLOBULIN.—Based on measurements of Ferry, Cohn, Oncley 
and Blanchard (15), of Ferry and Oncley (17), and of Shack (47) (by 
the calorimetric method). See also Oncley, Ferry and Shack (37). 

Eee AtBumin.—Measurements of Oncley (36) and Shack (47). 
See Oncley, Ferry and Shack (37). 

Szrum ALBuMIN.—Measurements by Oncley (35) on serum albumin 
fractions prepared by McMeekin (27). Ferry and Oncley (16) gave 
approximately the same results on less highly purified fractions. 

HemocLosin.—Measurements on horse hemoglobin by Oncley (34) 
and by Shack (47). See also Oncley, Ferry and Shack (37). A value 
of 17 X 10-8 for the relaxation time of pig hemoglobin was obtained 
by Arrhenius (6). 

Eprestin.—Measurements of Oncley (35). The results were com- 


plicated by the presence of about 10% of a higher molecular weight 
substance. 


Viscosity 


LactocLoputin.—From Polson (45). The data extends from 1 to 
5% protein concentration. 


Prrsin.—From Polson (45). The concentration range was from 
1to3% 


ONCLEY: THE SIZE AND SHAPE OF PROTEIN MOLECULES 149 


Ecc ALsumin.—Polson (45) gave 5.8, while measurements of Mehl, 
Blanchard and Walker (29) gave the somewhat lower result of 5.2. 
The value in TABLE 1 is an average of these results. 

Serum ALBUMIN.—Obtained by consideration of results published 
by Fahey and Green (14), Neurath and Saum (31) and a single point 
of Polson (45). 

Hemociosin.—Polson (45) gave 5.3, while measurements of Cohn 
and Prentiss (10) and of Kunitz, Anson and Northrop (21) lead to 
somewhat lower results. The value given in TABLE 1 is an average 
of these results. 


References for Notes on Table 1 


1 Adair, G. 8. Skand. Arch. Physiol. 49:76. 1926. 
2 Adair, G. 8S. Proc. Roy. Soc. London A108: 627. 1926;120:573. 1928. 
3 Adair,G.8. Ann. Rev. Biochem, 6: 180. 1937; Jour. Am, Chem. Soc. 49; 2524. 1927. 
4 Adair, G. S., & Robinson, M.E. Biochem. Jour. 24: 1864. 1930. Sh ¥e 
5 Anson, M. L., & Northrop, J. H. Jour. Gen. Physiol. 20: 575. . 1937. 
6 Arrhenius, 8S. Physik. Zeit. 39:559. 1938. 
7 Burk, N.F. Jour. Biol. Chem. 98: 353. 1932. 
8 Burk, N.F. Jour. Biol. Chem. 120: 63. 1937. 
9 Burk, N. F., & Greenberg, D. M. Jour. Biol. Chem. 87: 197. 1930. 
10 Cohn, E. J., & Prentiss, A.M. Jour. Gen. Physiol. 8: 619. 1927. 
11 Cohn, E. J., Ferry, J. D., Livingood, J. J., & Blanchard, M.H. Science 90: 183. 1939. 
12 Crowfoot, D., & Fankuchen,I. Nature, 141:521. 1938. 
13 Eriksson-Quensel, I.-B. unpublished work. 
14 Fahey, K., & Green, A.A. Jour. Am. Chem. Soc. 60: 3039. 1938. 
15 Ferry, J. D., Cohn, E. J., Oncley, J. L., & Blanchard, M. H. Jour. Biol. Chem. 128: 
Proc. 28. 1939. 
16 Ferry, J. D., & Oncley, J.L. Jour. Am, Chem. Soc. 60: 1123. 1938. 
17 Ferry, J. D., & Oncley, J.L. Jour. Am. Chem. Soc. 63: 272. 1941. 
18 Herzog, R.O. Zeit. Elektrochem. 13: 533. 1907. 
19 Hess, E. E., & Williams, J. W. unpublished work. 
20 Kekwick, R.A. Biochem. Jour. 32: 552. 1938. 
21 Kunitz, M., Anson, M.L., & Northrop, J. H. Jour. Gen. Physiol. 17: 365. 1934. 
22 Kunitz, M., & Northrop, J. H. Jour. Gen. Physiol. 18: 433. 1935; 20: 575. 1937. 
23 Lamm, O., & Polson, A.G. Biochem. Jour. 30: 528. 1936. 
24 Marrack, J., & Hewitt, L. F. Biochem, Jour, 23: 1079. 1929. 
25 McBain, J. W., & Liu, T. H. Jour. Am. Chem. Soc. 53: 59. 1931; McBain, J. W., & 
Dawson, C.R. Proc. Roy. Soc. London A148: 32. 1935. 
26 McFarlane, A. S., & Kekwick, R.A. Biochem. Jour. 32: 1607. 1938. 
27 McMeekin, T. L. Jour. Am. Chem. Soc. 61: 2884. 1939; 62: 3393. 1940. 
28 Mehl, J. W., & Schmidt, C.L. A. Univ. Cal. Publ. Physiol. 8: 165. 1937. 
29 Mehl, J. W., Blanchard, M. H., & Walker, P. H. unpublished work. 
30 Mutzenbecher, P. v. Biochem. Zeit. 266: 226, 250, 259. 1933. 
31 Neurath, H., & Saum, A.M. Jour. Biol. Chem. 128: 347. 1939. 
32 Nichols, J.B. Jour. Am. Chem. Soc. 52: 5176. 1930. 
33 Northrop, J.H. Jour. Gen. Physiol. 13: 739. 1930, 
34 Oncley, J. L. Jour. Am. Chem. Soc. 60: 1115. 1938. 
35 Oncley, J.L. Jour. Phys. Chem. 44: 1103. 1940. 
36 Oncley, J. L. unpublished work. 
37 Oncley, J. L. Ferry, J. D., & Shack, J. Ann. N. Y, Acad. Sci, 40: 371. 1940. 
38 Pedersen, K.O. Biochem. Jour. 30: 961. 1936. 
39 Pedersen, K. O. unpublished work. 
40 Pedersen, K. O., & Anderson, K. J. I. unpublished work. 
41 Philpot, J. St. L., & Eriksson-Quensel, 1.-B. Nature 132: 932. 1933. Philpot, 
J. 8t. L. Biochem. Jour. 29: 2458. 1935. 


150 ANNALS NEW YORK ACADEMY OF SCIENCES 


42. Pickels, E. G. unpublished work. 

43 Polson, A.G. Biochem. Jour. 31: 1903. 1937. 

44 Polson, A.G. Kolloid. Zeit. 87: 149. 1939. 

45 Polson, A.G. Kolloid, Zeit. 88; 51. 1939. 

46 Roche, J., Dorier, M., & Marquet, F. Compt. rend. Soc. Biol. 119: 1150. 1935. 

47 Shack, J. Ph.D. Dissertation, Harvard University. 1939. 

48 Sjogren, B., & Svedberg, T. Jour. Am. Chem. Soc. 52: 5187. 1930. 

49 Sjogren, B., & Svedberg, T. Jour, Am. Chem. Soc. 53: 2657. 1931. 

50 Sdrensen, 8. P.L. Compt. rend. trav. lab. Carlsberg 12: 262. 1917. 

61 Sérensen, 8. P.L. ‘‘Proteins.'"’ New York. 1925. 

52 Steinhardt, J. Jour. Biol. Chem. 123: 543. 1938. 

53 Svedberg, T., & Eriksson, I.-B. Jour. Am. Chem. Soc. 56: 409. 1934. 

54 Svedberg, T., & Fahraeus, R. Jour. Am. Chem. Soc. 48: 430. 1926. 

55 Svedberg, T., & Nichols, J.B. Jour. Am. Chem, Soc. 48: 3081. 1926. 

56 Svedberg, T., & Nichols, J.B. Jour. Am. Chem. Soc. 49: 2920. 1927. 

57 Svedberg, T., & Pedersen, K. O. ‘‘The Ultracentrifuge.'’ Oxford Univ. Press 
London. 1940. 

58 Svedberg, T., & Sjigren, B. Jour. Am. Chem, Soc. 50: 3318. 1928. 

59 Svedberg, T., & Sjégren, B. Jour. Am. Chem. Soc. 52: 279. 1930. 

60 Svedberg, T., & Stamm, A.J. Jour. Am. Chem. Soc, 51: 2170. 1929. : 

61 Taylor, G. L., Adair, G. §., & Adair, M.E. Jour. Hygiene 32: 340. 1932. . 

62 Tiselius, A., & Gross, D. Kolloid. Zeit. 66:11. 1934. 

63 Williams, J..W., & Watson, C.C. Nature'139: 506. 1937. 


THE X-RAY DIFFRACTION METHODS USED 
IN PROTEIN STUDIES 


By B. E. WARREN 


From the George Eastman Laboratory of Physics, M assachusetts Institute of Technology, 
Cambridge, Massachusetts 


The following discussion is intended primarily for those who are 
not specialists in X-ray diffraction methods, and is planned to give 
an elementary picture of the ideas involved, the methods used, the 
kind of information obtained, and the reliability of the conclusions 
which can be drawn. 

A crystal is a three-dimensional scheme of repetition, built up by 
some unit of structure repeating itself identically at regular intervals 
in three dimensions. The scheme of repetition is specified by three 
axes, a, b, c, which give the shortest translations between identical 
points. Ficure 2a indicates schematically in two dimensions the re- 
peating nature of a crystalline structure. The volume of the paral- 
lelopipedon defined by the three axes, a, b, c, is called the unit cell. 
This is the smallest volume of crystal which repeated can produce the 
crystal as a whole. The problem of crystal structure analysis divides 
into two stages: the determination of the size and shape of the unit 
cell (axes a, b, c, and angles); and the positions of all the atoms 
within the unit cell. 

For complex crystals such as proteins, the single crystal rotation 
method is generally used. The method is indicated schematically in 
ricure 1. A single crystal is mounted so that it can be rotated about 
one of its crystallographic axes, say c. A monochromatic X-ray beam 
(usually the Ka line of copper, A= 1.539A) is collimated by passing 
through two pin holes. The X-ray beam is scattered by every atom 
in the crystal, and the secondary waves coming from the atoms will 
show constructive interference in certain directions, and will cancel 
out in other directions. A series of diffracted beams will result, and 
these beams are recorded on either a cylindrical film or a flat film. 
If the experimental arrangement is as indicated in FicuRE 1, the dif- 
fracted beams will all lie on a set of cones, which intersect a cylindrical 
film to give a series of layer lines. 

Measuring the layer line separation y, one computes the angle Bi by 
the relation 


Yi 
tan =— 
ch R 


(151) 


152 ANNALS NEW YORK ACADEMY OF SCIENCES 


The length of the crystallographic axis about which the crystal was 
rotated is then given by 
a (ye 
sin 01 
Knowing the wave length A, and measuring the layer line separation, 
one gets directly the length of the axis. The other two axial lengths 
are then obtained by rotation patterns rotating around the correspond- 
ing axes. The only possibility of mistake comes from overlooking 
any very weak layer lines, but this is not likely to happen often. 


c= 


Cc 


Ficeure 1. Schematic arrangement of X-ray beam and crystal in making rotation 
patterns of single crystals. 


The dimensions of the unit cell are therefore obtained directly from 
the geometry of the diffraction pattern, and in general can be taken 
as absolutely reliable. 

If n is the number of molecules per unit cell, M the molecular weight, 
N the Avogadro number, and p the density of the crystal, expressing 
the mass of the unit cell in two different ways gives the following 
relation: 


nM . 
"5 Ve (2) 
NV 
ae (3) 
NV 


M =—p¢ (4) 
n 


WARREN: THE X-RAY USED IN PROTEIN STUDIES 153 


Patterson Plot. 


Figure 2. Schematic representation in two dimensions of a crystalline structure and the 
corresponding Patterson plot. 


If one knows the molecular weight M, the number of molecules per 
unit cell can be calculated from (3). If one knows the number of 
molecules per cell, the molecular weight can be calculated from (4). 
In general M cannot be obtained uniquely from X-ray data alone. 
Using (4) one can calculate a series of possible molecular weights to 
be compared with determinations by other methods. Often the lattice 
and the symmetry of the crystal will impose certain restrictions upon 
n and consequently upon M. The X-ray value of the molecular 
weight of a protein means one of the series of values calculated by (4) 
which agrees best with ultracentrifuge values, and involves a value 
of n which is compatible with the symmetry of the crystal. The 


154 ANNALS NEW YORK ACADEMY OF SCIENCES 


numerical value of M is uncertain to the same extent as the value of ~ 
the density which is used in the computation. 

The next step in the structure determination involves finding the 
position of each atom in the unit cell. This is by far the most difficult 
step, there is no straightforward method, and often it has turned out 
to be too difficult to be carried through. The positions of the atoms 
determine the intensities of the diffracted beams, and atomic positions 
must be deduced from intensity considerations. 

The use of symmetry considerations in determining atomic positions 
is often of great help. This kind of reasoning is known as application 
of space group theory. It turns out that there are 230 possible com- 
binations of symmetry operations which can exist in a crystalline 
structure. The statement of the space group of a crystal is simply 
the statement of the kind of symmetry operations and their spacial 
arrangement. 

The structure factor, F, is the amplitude of scattered radiation per 
unit cell. It involves the positions of the atoms in the unit cell (co- 
ordinates x, y, z) and is given by the expression 


hin kyn lin 
Puree et FFs (5) 
n 


h, k, | are the Miller indices of the diffracting planes, f, is the ampli- 
tude of scattered radiation from atom n. What one observes exper- 
imentally is the integrated intensity of a diffracted beam, and this 
quantity is given by the equation 


Tet = K Fru? (6) 


The factor K is known for any experimental arrangement and 
diffracted beam hkl, so that from the intensities of the diffracted beams 
one gets directly a series of values for Fay. 

Right at this point comes the stumbling block of X-ray crystal- 
lography. Since it is F? which is determined experimentally, one gets 
only the magnitude of / without knowing the phase. For example, 
in a simple case one does not know from measurements of F? whether 
F is positive or negative. If one could measure F directly it would 
be possible by Fourier series methods to plot out the distribution of 
scattering matter in the unit cell, and hence to determine the atomic 
positions directly from the X-ray data. However, this is impossible. 

We are forced to carry on by indirect methods which fall into three 
classes. 


WARREN: THE X-RAY USED IN PROTEIN STUDIES 155 


(a) Cut and try methods. Using all available physical, chemical, 
and symmetry data to guess reasonable structures, and then 
comparing the calculated F? values with those obtained from 
the intensities. Notice the one-way nature of the equations: 
from a structure one can calculate all the F? values, but from 
the F? values one cannot directly calculate a structure. 

(b) Use of heavy atoms, either in the structure, or put in by 
isomorphous replacement. Sometimes this allows the positions 
of the heavy atoms to be determined independently ; sometimes 
it is used to determine the phase of reflections; sometimes it 
is used in conection with Patterson plots. 

(c) The method of F? series, Patterson or Patterson-Harker plots. 
In case no information is available other than the X-ray in- 
tensity data, the only method of making progress involves using 
the Patterson type Fourier plots. A Patterson plot is given 
by a Fourier series in which the coefficients are the Fz; 
quantities, which themselves are determined directly from the 
experimental intensities. 


+2 BLO EY AeelZ 
V(x, Y, Z) = Dai cos 2( — Tene + =) (7) 
=A A 
The Patterson 2-dimensional projection is given by the equation 
ane hx bY 
V(X, Y) = OD) Fan? cos 2n(— + <) (8) 
= a 


Since the Fru? values are determined directly by experiment, it is 
evident that the Patterson type series can be evaluated and plotted 
out. It should be emphasized that a Patterson plot does not directly 
represent a structure. 

The significance of a Patterson plot is illustrated in Figure 2. The 
upper figure shows four unit cells of a erystal containing three atoms 
per unit cell. In the lower figure the position of each peak with re- 
spect to the origin gives the position of some atom in the structure 
with respect to another atom. For example, the displacement of the 
peak B,C, from the origin of the plot represents the displacement of 
the atom C, from atom B,. The peaks of a Patterson plot give the 
positions of each atom with respect to every other atom, and not the 
positions of the atoms with respect to a single origin. 

Tt is obvious that for a complex structure the Patterson plot may 
be very involved and the interpretation extremely difficult. Whereas 
in theory it should be possible to work out quite complex structures 


156 ANNALS NEW YORK ACADEMY OF SCIENCES 


from a Patterson plot, in actual practice it is far more difficult than 
would be imagined. The difficulties are mostly of a practical nature: 
ghost peaks due to breaking off the series, effects of errors in the 
experimental intensities, and, worst of all, superposition and over- 
lapping of the peaks. 


EVIDENCE FROM X-RAYS REGARDING THE 
STRUCTURE OF PROTEIN MOLECULES 


By I. FankucHEN* 
From Massachusetts Institute of Technology, Cambridge, Massachusetts 


The problem of the internal structure of the protein molecule is one 
on which very little real progress has been made since the fundamental 
researches of Fischer. His work showed that proteins were composed 
of a-amino acid residues bound together by peptide links. Other papers 
presented at this conference have dealt in detail with the chemical 
evidence on the constitution of proteins. 

In recent years, the application of X-ray diffraction methods to the 
study of protein structures has offered a new approach to this very 
fundamental problem. Astbury’s'? work has given us much informa- 
tion as to the stereochemical configuration of the fibrous proteins. 
These contain polypeptide chains, fully extended as in silk or folded 
in various ways as in wool. The detailed nature of the folding, how- 
ever, has not been established even in these comparatively simple 
proteins. The crystalline proteins, when denatured, also appear to 
contain polypeptide chains. More recently, the use of X-ray methods 
has been extended to the study of the native crystalline proteins. The 
diffraction effects are quite complex and at certain stages of the an- 
alysis, differences of opinion concerning the interpretation of the results 
have arisen. Enough work has by now been published to make a 
critical examination of the techniques and data desirable. Such a 
critical examination should show where assumptions enter and by 
making clearer the distinction between fact and theory help put the 
X-ray study of proteins on a sounder basis. 

The first X-ray studies in which diffraction effects were clearly ob- 
tained all used the powder technique. The specimen consists of a poly- 
crystalline mass of material irradiated by a comparatively mono- 
chromatic X-ray beam. By the use of this method, squash seed 
globulin,t chymotrypsinogen,’ the Bence-Jones protein,® urease and 


* National Research Fellow in Protein Chemistry, Massachusetts Institure of Tech- 
nology. 

1 Astbury, W. T. Jour. Textile Inst. 27: 281. 1936. 

2 Astbury, W. T. Science Progress No. 133. July 1939. 

3 Astbury, W. T. Nature 140: 968. 1937. 

«Astbury, W. T., & Lomax, R. Jour. Chem. Soc. 846. 1935. 

5 Corey, R. B., & Wyckoff, R. W. G. Jour. Biol. Chem, 114: 407. 1936. 

¢ Magnus-Levy, A., Meyer, K. H., & Lotmar, W. Nature 137: 616. 1936. 


(157) 


158 ANNALS NEW YORK ACADEMY OF SCIENCES 


pepsin,” egg albumin,® and hemoglobin,® 1° were all shown to give sharp 
diffraction rings. These observations were taken as proving the true 
crystallinity of the materials studied. This conclusion, while un- 
doubtedly correct in the above cases, does not necessarily follow from 
the observation of sharp diffraction rings. The possibility of error 
arises from the great size of the molecules. It is true that the produc- 
tion of sharp diffraction effects indicates a periodic structure, but such 
a structure may well exist within the molecule itself and the rings may 
therefore be no proof of the crystalline arrangement of the molecules 
relative to one another. Such a situation has been found to be the 
case for tobacco mosaic virus.1+ Of course, if the particles are large 
enough to be studied in the microscope and are there seen to-be truly 
crystalline, such considerations need not disturb us. Except for simple 
cases like cubic and hexagonal close-packed systems, powder data from 
proteins by themselves are not easy to interpret. One can compute the 
spacings responsible for the diffraction rings and perhaps if such ques- 
tions arise, settle questions of identity, but it is difficult to go much 
further. For this reason, the data from the above researches will not 
in general be considered in the discussion to follow and are not included 
in the tabulated data. 

The study of protein structures by single crystal methods has, up to 
now, been done primarily by Bernal and his school. Tasty 1, in addi- 
tion to the results of single crystal studies also includes two proteins, 
tobacco seed globulin and the bushy stunt virus of tomato, which were 
studied by the powder method. The symmetry of these two proteins, 
both cubic, permitted unit cell and molecular weight determinations to 
be made and they were therefore included. The unit cells for insulin 
and excelsin are referred to orthohexagonal axes. Their primitive 
cells are rhombohedral and contain only one molecule per unit cell. 

The values of the unit cell dimensions are obtained from measure- 
ments of the locations of the X-ray reflections and therefore constitute 
the surest quantitative information we can derive from these X-ray 
studies. For ordinary crystals the accuracy of spacing measurements 
can be pushed as high as 1 in 100,000 but for protein crystals nothing 
like this accuracy can be hoped for. The small size of most protein 
crystals, the magnitude of the spacings (and consequent small angles 
of deviation of the X-ray beam), the imperfections of the crystals and 


7Fankuchen, I. Jour. Am. Chem, Soc. 56: 2398. 1934 

8 Clark, G. L., & Shenk, J. H. Radiology 28:58. 1937. 

° Clark, G. L., & Shenk, J. H. Radiology 28: 144. 1937. 

10 Wyckoff, R. W.G., & Corey, R. B. Science 81: 365. 1935. 

u Bawden, F. C., Pirie, N. W., Bernal, J. D., & Fankuchen,I. Nature 138:1051. 1936. 


SS ee 


FANKUCHEN: THE STRUCTURE OF PROTEIN MOLECULES 159 


the low intensity of the reflections all combine to make accurate 
measurements difficult. With reasonable precautions the probable 
error can be reduced to one part in 250 and consequently the volume 
of the unit cell can be found to within approximately 17. For present 
needs this accuracy is adequate. Future developments may require 
higher accuracy in spacing determination and it is probable that the 
accuracy of the spacing determination can be increased to a limit set 
by the inherent imperfections of the periodic structure of the crystals. 

If the volume of the unit cell is known, the determination of the 
density enables the direct computation of the mass of the unit cell to be 
made. The mass of the hydrogen atom (atomic weight 1.008) is 
1.66 x 10-24 grams. Thus the molecular weight of the cell contents 
can be obtained by dividing the mass of the unit cell by 1.65 x 10-*. 
If the number, n, of molecules in the cell is known, a simple division 
gives the molecular weight of the protein molecule. A number of ques- 
tions may now well be asked. What densities are to be used and how 
are they to be measured? How is n, the number of molecules in the 
unit cell, determined? In the attempt to answer these questions, we 
become aware of some serious difficulties. 

Single protein crystals have been studied both dry and immersed in 
solutions and wherever both wet and dry crystals of the same protein 
have been used, appreciable differences in the unit cells occur (see 
TABLE 1). In computing molecular weights, the density to be used is 
the density of the crystal in the medium in which it is studied—for dry 
crystals the density in air and for wet crystals, the density in the 
solution. However, to measure densities, immersion in some liquid is 
usually necessary. It has been shown by several workers’ ** that ap- 
preciable differences in the densities are found depending on the medium 
used to immerse the crystals. It has also been shown!‘ that variations 
in the cell dimensions exist when the crystals are studied in different 
solutions. The molecular weights obtained from centrifuge data are 
presumed to correspond to anhydrous molecular weights!® and are 
therefore to be compared directly with the molecular weights deter- 
mined from X-ray measurements on dry crystals (values corrected 

for residual water where possible). But this comparison is mean- 
ingful only if the density used in the X-ray determination is accurately 
the density of the dry crystal. Similar difficulties occur for the wet 
erystals. These give the best X-ray diffraction effects and conse- 
» Adair, G. S., & Adair, M.E. Proc. Roy. Soc. London B120: 422. 1936. 
18 Crowfoot, D. Proc. Roy. Soc. London A164: 580. 1938. 


14 Crowfoot, D., & Riley, D. Nature 144: 1011. 1939. 
1s Svedberg, T. Proc. Roy. Soc. London B127: 7. 1939. 


160 ANNALS NEW YORK ACADEMY OF SCIENCES 


quently a knowledge of the accurate X-ray wet molecular weights 
would be most useful if we knew precisely what they meant. Thus the 
dry molecular weights could be computed if we knew the state of 
hydration of the wet crystals. Until we know more about the densities 
of protein crystals and the role of the ions in the solutions in which 
they are immersed, any comparisons should be made very cautiously. 

The determination of the number, n, of molecules in the unit cell 
always involves the introduction of non X-ray data. The number 
must be an integer, usually a small one, and the symmetry of the 
space group imposes further restrictions on the possible values of n. 
Dividing the cell molecular weight by the possible values of n gives 
a set of possible molecular weights, the maximum of which is the cell 
molecular weight. One then chooses such a value for n from the set 
of possible values as will give a molecular weight in best agreement 
with the values given by other techniques: ultracentrifuge, chemical 
analysis, etc. In general, a very rough idea of the molecular weight 
is sufficient to define n. This procedure sometimes imposes restrictions 
on the molecular symmetry. Thus in insulin, where there is one 
molecule in the unit cell, the molecules must have trigonal symmetry 
to within the accuracy of the X-ray data; and in horse methemoglobin, 
the choice of the molecular weight makes n = 2, and thereby imposes 
a twofold axis of symmetry on the molecule. The question may be 
purely academic, but it does seem possible that in some protein crystals 
the unit may be different from what it appears to be in the ultracen- 
trifuge, let us say. In such a case, if we make use of the ultracentri- 
fuge value in determining n, some erroneous deductions as to the size 
and symmetry of the protein unit in the crystal will then result. 

Some further rough idea as to the size and shape of the molecules 
may be obtained from a knowledge of the unit cell size and space 
group. This about exhausts the information one can obtain from a 
study of the location of the reflections without considering their relative 
intensities. The choice of the space group depends to a large extent 
on a study of systematic absences of reflections—a special case of 
intensity of reflection zero—but such determinations of systematic 
absences are fairly free from pitfalls or ambiguities. 

Such a statement unfortunately cannot be made of the extension 
of the analysis of the data to a study of the relative intensities of the 
reflections. The relative intensities of the various reflections are de- 
termined by the distribution of scattering matter within the unit cell 
and it is the ultimate goal of the structure analysis to work backwards 
from the intensities to the distribution. In the simplest crystals, the 


FANKUCHEN: THE STRUCTURE OF PROTEIN MOLECULES 161 


X-ray data alone will suffice to determine the structure but it seems a 
fairly general rule that as the structures grow more complicated, an 
increasing amount of outside information is necessary. (In special 
cases!® this information may consist of X-ray data on isomorphous 
crystals, and the suggestion has been made’ that this method be ap- 
plied to protein X-ray studies.) At the same time the way the X-ray 
data are used changes. Thus, in the simpler structures, one may work 
either from the X-ray data to the structure without making use of any 
preconceptions as to the answer, or reverse the procedure by assuming 
a structure (enough is now known of crystal chemistry to enable 
shrewd and often accurate guesses to be made), and then testing the 
assumption by computing what its X-ray reflections should be. As 
the structures grow more complicated, the second method tends to be 
used increasingly, and when structures as complex as the proteins are 
reached, one is sorely tempted to use the second method exclusively. 
Given a set of X-ray intensities, we can assume a structure based on 
some theory and then test our theory by a comparison of observed and 
computed intensities. For proteins, however, this testing must be done 
very carefully with one eye on the character of the data and another 
on the theory to be tested. Thus, if the X-ray data are limited in 
scope (as is often the case for proteins) agreement between observed 
and computed intensities can say nothing about any fine structure of 
the theory and indeed need not even be a confirmation of the large- 
scale details of the structure being tested.1® 19 

Such questions are perhaps more easily discussed in terms of Fourier 
series and Patterson diagrams. Sir William Bragg showed”? that the 
three-dimensional periodic structure of scattering matter in a crystal 
could be expressed as a three-dimensional Fourier series, the coefficients 
of which are the structure factors, F, of the various X-ray reflections. 
Unfortunately, we do not observe F directly. The X-ray intensity is 
proportional to |F|? and in general F may be a complex number whose 
amplitude is determinable from the X-ray intensities but whose phase 
is not. Various methods have been used for determining these un- 
known phases but so far they have not been applied to protein data. 

Patterson?! 22 considered the Fourier series in which values of |F'|? 
were used as coefficients instead of values of F, and showed that such 

16 Robertson, J. M. Jour. Chem. Soc. 615. 1935; 1195. 1936. 

11 Robertson, J. M. Nature 143:75, 1939. 

18 Bragg, W.L. Nature 143: 73. 1939. 

19 Bernal, J.D. Nature 143: 74. 1939. 

20 Bragg, W. H. Phil. Trans. A215: 253, 1915. 


2 Patterson, A. L. Zeit. Krist. 90: 517. 1935. 
» Patterson, A.L. Zeit. Krist, 90: 543. 1935. 


162 ANNALS NEW YORK ACADEMY OF SCIENCES 


a series could be given a physical meaning. The three-dimensional 
summation of |F|? terms gives a periodic assemblage of peaks which 
represent the ends of interatomic vectors drawn from a common origin. 
Analagous to the case in the F series, two- and one-dimensional sum- — 
mations may be made which use only part of the available X-ray 
data and represent projections of the three-dimensional peak system 
onto a plane or line, respectively. Such two-dimensional projections 
have been most widely used for two reasons. The amount of data 
required is much less than for the three-dimensional summation, and 
the task of computing the series is also not so involved. The great 
drawback of the two-dimensional projection is the superposition of 
peaks which makes the unravelling of the pattern a most difficult and 
often impossible task. Harker?* has pointed out that usually only 
certain sections of the three-dimensional summations are important. 
The computation then resolves itself into the summation of two- and 
one-dimensional series although for such sections the complete set of 
X-ray intensities are used. Robertson®* has recently reviewed this 
field in a most comprehensive and clear paper. 

Theoretically, the Patterson peaks can be related to interatomic 
vectors, and a knowledge of the contents of the unit cell should enable 
one in most cases to determine an unique periodic distribution of scat- 
tering matter which would correspond to the experimentally deter- 
mined Patterson diagram. In the case of protein crystals, superposi- 
tion of vectors occurs and the few peaks found in protein Patterson 
diagrams are the result of much collaboration between vectors. The 
Patterson diagrams can then test only gross features of the proposed 
structure. Indeed, while the Patterson summation contains all the 
X-ray data and theoretically an harmonic analysis of the summation 
would give the original set of F? values uniquely, yet the Patterson 
diagram, because it is a summation, conceals the fact that scores of 
individual intensities went into its making. It may be the best way 
of presenting the totality of X-ray data, but that does not necessarily 
make it the best way of using the data to test proposed structures. 
It may, for this purpose, be desirable to make use of the individual 
intensities. Here, absolute intensities would be most valuable. In 
any event, a limit is set by the minimum spacings observed on the fine 
grain structure which can be tested. 

It would appear desirable, from this discussion, to work if possible 
from the X-ray data to a structure instead of attempting to use the 


23 Harker, D. Jour. Chem, Phys. 4: 381. 1936. 
24 Robertson, J. M. Rept. Progress Phys. Physical Soc. 4: 332. 1937. 


FANKUCHEN: THE STRUCTURE OF PROTEIN MOLECULES 163 


X-ray data to test structures. The rate of progress may then be pain- 
- fully slow, but to compensate, such a procedure reduces the danger of 
believing that more has been proved than is actually the case. The 
whole problem is a difficult one. However, enough progress has al- 
ready been made so that one may justly believe that a slow, cautious 
approach will materially contribute to a fuller knowledge of the struc- 
ture of proteins. 

An attempt will now be made to summarize the information we 
have obtained about protein structures as a result of X-ray studies. 
The cell size and molecular weight determinations are listed in TABLE 1. 
A survey of the character of the X-ray data gives some information as 
to the regularity of the structures. Uniformly, the crystals immersed 
in solutions give far better photographs than the dry crystals, the 
reflections being more numerous and intense, better defined and what 
is most significant, reaching out to much smaller spacings than is the 
case for the dry crystals. This suggests not only some disorientation 
of the molecules in the dry crystals, but probably also some minor 
rearrangement within the molecules. In all cases where both wet and 
dry crystals of the same protein have been studied, cell shrinkages 
have been observed. It is reasonable to assume that upon drying most 
of the water is removed from between the molecules which then rear- 
range themselves while remaining substantially unchanged as mole- 
cules. It has been suggested?’ that by studying the changes in the 
Patterson diagrams as we pass from wet to dry proteins, one will be 
able to separate the effects due to inter- and intramolecular scattering. 

Perhaps the most important characteristic of the intensity patterns 
is the general enhancement of the reflections in the neighborhoods of 
10A and 4.5A for wet crystals and 10A for the dry ones.27 82. These 
dimensions were found by Astbury! to be the periodicities of the pack- 
ing of the polypeptide chains in directions at right angles to their 
length, and correspond to the average length of the side chains and 
backbone spacing, respectively. It is thought that the change in 
going from a crystalline protein to the denatured form, while funda- 
mental, does not involve any major rearrangement of the constituent 


2 Crowfoot, D. Nature 135: 591. 1935. 

26 Crowfoot, D., & Riley, D. Nature 141: 521. 1938. 

27 Bernal, J. D., Fankuchen, I., & Perutz, M. Nature 141: 523. 1938. 
28 Bernal, J. D., & Crowfoot, D. Nature 133: 794. 1934. 

29 Crowfoot, D., & Fankuchen, I. Nature 141: 522. 1938. 

30 Astbury, W. T., & Bell, F.O. Tab. Biol. 17 (1): 106. 1939. 

1 Bernal, J. D., Fankuchen, I., & Riley, D. Nature 142: 1075, 1938. 
2 Bernal, J.D. Nature 143: 663. 1939. 


ANNALS NEW YORK ACADEMY OF SCIENCES 


164 


I=u 
MIoPI T=? 
+ 6h= 0D 
is ia v's a Supedg pearesqo ysoyyeug 
ie ZO ? ar req leg ea dno1y voeds 
000'000'rz ooo'zer | ooo‘ss 003'+8 ; 000'29 OOF Ze qUsIEM IEMIe[OWT 19M 
9821 are I oe'T LLB T : 193° T 8z'T ysuad]| ¢ 
000°009'08 000'9zt 003'99 000°60r ooo'ez 000'88 000'z9 emoelonn ed oumyoA < 
000‘000'T9 ooo'zee | ooo'oss‘e | o00'6tz | 000'809 | o00'zoz 000‘F0% e¥ Ur TOA|,; 
#68 2°09 19¥ 9 g'eeT ¥ST re J uys 9 
006 oSIT 006 GOL 006 006 006 g 
$68 Ze 19 3°99 g* eet FST 73 E) 
POE 8°29 19 8°29 g°29 o°19 88 q 
H6E Ort 9IT 9'6¥ o°19 ¢°29 aat D 
es) 
4yun [e1p ; 
-egoqmorys 
0} poriejaxy 
T=u I=u 
98009= ® /8oPTL= ? 
£°98= 0 $ vb=D 
eI g 02 0% L supedg pearesqo 4serreug 
I ey a ZO eq Arar ae ey dnory soeds 
oos'¢oe | ooo'zze | 002'99 000'FS oor'2e JOyeM TENPISEN | py 
: Joy poxdoLI0H YZEM IEMI|OWW 
000‘000‘eT| £000'0¢¢ 000‘09E 000'Z2 000‘09 OOT‘OF 000‘0F 008'6€ QUZIOM IeTNI[OTAT| A 
SET UIe'T 1861 O2Z'T Te°T 0e'T LET gTe'T Aqisue || 4 
000'000'9T 000'¢FF 000'09% 000'76 oog'e2 000'Ts 000'¢¢ o00'08 emoejoW Jed eumjoA 
8 
000'000‘ze| 000'029‘z | 000'0¢s'T | O00'ssT 000'TSt | oo0'sor | o00'9TF 000‘86z 2¥ Ul OA 
ste 380% Tal 98 g'eg oe OIL 6'0 g UIs 9 
006 006 006 o08T oSIT 006 006 006 g 
8Ie %' 80% raid LY g'2g oer Ort 6'0 2 
Sie 98 £21 1 o°Z9 9¢ £9 8° PL q 
ST 6PT £21 Z01 oF 9¢ 09 oer ("09 s-OT) “y UEP 
000‘008'8 : : 00z‘6e 008‘Tr 006‘0% qUSIEM TeMIa{OW 
000'009'2 | o00'%6z | ooo0'ooe | ooz'99 | —oos‘se 000'T 006‘28 OOr'se Aei-X WON 
(Te) sara (62) (gp)_(22) cy) (22) | orpoe qemnqe, <7", 
unas (ogi ioe) TLD furqorsowoy ed . Gea seek Sp es (og ee) sooussejoy PU UlI}OIg 
AVENE ooseqog, | PN 90H) “omAUO |(g5) (gz) uNMgoTsojoeT| 
Jy wiavy, 


FANKUCHEN: THE STRUCTURE OF PROTEIN MOLECULES 165 


X-RAY DATA 


Insulin: Crowfoot‘: 25 (dry), and Crowfoot & Riley14 (wet). 

Lactoglobulin: Crowfoot & Riley. Further work is being carried on by these workers 
(private communication) including Fourier studies and X-ray studies of crystals in various 
states of hydration.‘* 

Chymotrypsin: Bernal, Fankuchen & Perutz,2’? Perutz.‘5 (In the original communi- 
cation?? n was taken as 4. It is now suggested by Perutz that n = 2. This gives a mo- 
lecular weight in better agreement with the data.) Crystals prepared by Northrop. 

Hemoglobin: Bernal, Fankuchen & Perutz.:7 Further intensive study is being done 
by Perutz (private communication).45 

Pepsin: Bernal & Crowfoot.2* These measurements were the first on single protein 
crystals and repetition of this work would be desirable. 

Tobacco Seed Globulin: Crowfoot & Fankuchen.2? Fairly old, dry crystals were used 
prepared by Vickery. Only three extremely faint powder lines were observed. It would 
be desirable to have these measurements repeated and an effort to obtain data on wet 
crystals would be worthwhile. 

Excelsin: Astbury, Dickinson & Bailey** and Astbury & Bell (unpublished work). 
Astbury & Bells° refer to this unpublished work in their paper in Tabulae Biologicae. 

Bushy Stunt Virus: Bernal, Fankuchen & Riley.*1 Material furnished by Bawden & Pirie. 


NON X-RAY MOLECULAR WEIGHT DATA 


For insulin; lactoglobulin, pepsin and excelsin are taken from Svedberg** and are centri- 
fuge measurements. 

Chymotrypsin: 41,000, Kunitz & Northrop,‘4 measured by osmotic pressure. 

Hemoglobin: 66,700 is found by chemical methods. The ultracentrifuge value for 
horse hemoglobin"’ is 63,000-69,000. 

For Tobacco Seed Globulin: only the sedimentation constant 12.7 X 10-18 was known 
(Philpot, unpublished data). This is similar to the values for the other seed globulins 
whose molecular weights are about 300,000. 

Bushy Stunt Virus: molecular weights 7,600,000 and 8,800,000, McFarlane & Kekwick.41 

These molecular weights were used in determining ‘‘n’’ the number of molecules per 
unit cell. 


DENSITIES 


Insulin: both needle-shaped and flat rhombohedra were studied by Crowfoot. These 
were shown to possess identical crystal structure. The needles were imperfect and gave 
density values a trifle lower than the rhombohedra. The value chosen was the highest 
observed and was obtained from the largest rhombohedral crystals. Residual water 
5.35 % in air-dried crystals was determined by drying under reduced pressure at 104°. 

Lactoglobulin: Wet Tabular, 1.257 in sugar solution (Crowfoot).2* Wet Needles, 
dissolved too rapidly to permit of any density determinations. Dry Crystals, found to be 
1.27 by immersion in o-dichlorbenzene and toluene.%s Crowfoot believes this value too low, 
due to occlusion of air and uses an assumed density of 1.31 (insulin) in computations of 
molecular weights. 

Chymotrypsin: Wet. Determined by Perutz.” Dry, value of 1.31 assumed.” 

Hemoglobin: Wet. Determined by Perutz.” Dry, value of 1.26 assumed to correspond 
with value for serum albumin as given by Chick and Martin. 

Tobacco Seed Globulin: measured ni sodium phosphate buffer solution at pH 5.0. 
Residual water 10.4%. Determined by drying in vacuum at 100°. 

Excelsin: Value of 1.31 assumed by the writer to permit computation of molecular 
weight. Table in paper by Astbury & Bell gives only unit cell data and dry molecular 
weight. Assumption is made that this value is corrected for residual water. 

Bushy Stunt Virus: Wet crystals, 1.286, measured by immersion. Dry crystals, 1.35, 
from computations of McFarlane & Kekwick."1 


166 ANNALS NEW YORK ACADEMY OF SCIENCES 


atoms.32 33,34 Thus a single crystal of excelsin upon partial denatura- 
tion®* gives a composite fiber and single crystal photograph in which 
the fibers are oriented in the direction of the crystal axes. Undoubt- 
edly, these observations are connected with something fundamental 
in protein architecture, but thus far no sure explanation has been 
advanced. It has been suggested®? that in crystalline proteins an 
arrangement of side chains and backbone spacings occurs, similar to 
that found in fibrous proteins. This suggestion must be considered as 
purely tentative since enough confirmatory evidence is lacking. 

The protein crystal which has been studied most intensively is 
insulin.1*: 14, 25 The dry crystal which until recently was the only form 
studied did not give reflections comparable in quantity or quality with 
those obtained from other protein crystals, but it possessed the great 
advantage of having only one molecule per unit cell. This fact re- 
sulted in a great simplification in the Patterson diagrams which were 
the first step in an attempted analysis of the X-ray data. Even then, 
the interpretation of these diagrams had led to acute controversy which 
it is not my intention to enter into here. The papers dealing with © 
this subject are given below. 14, 17, 18, 19, 32, 34, 35, 36, 37, 38, 39, 40, 42 43 * 

It is desirable to see just how far the study of insulin has been 
carried by Crowfoot.1? The unit cell determinations are quite straight- 
forward and the cell volumes are correct to about 1.5%. In both wet 
and dry insulin, there is one molecule per unit cell whose space group 
is R3. This suggests a molecule having threefold symmetry but this 
symmetry need not extend to atomic dimensions in order to agree 
with the X-ray data. There is very good agreement between the 
X-ray and ultracentrifuge molecular weights. 

A total of fifty-nine reflections were observed, none corresponding 
to a spacing less than 7.05 A. This minimum spacing at once imposes 
a limit on what information can be obtained about the small-scale 


* A complete bibliography can be found in Pauling and Niemann’s paper #4. 
33 Astbury, W. T., Dickinson, S., & Bailey, K. Biochem. Jour. 29: 2351. 1935. 
34 Pauling, L., & Niemann, C. Jour. Am. Chem. Soc. 61: 1860. 1939. 

1 Bernal, J. D., Fankuchen, I., & Riley, D. Nature 143: 897. 1939. 

+ Riley, D., & Fankuchen,I. Nature 143: 648. 1939. 

37 Langmuir, I. Proc. Phys. Soc. 61: 592. 1939. 

38 Wrinch, D. M. Proc. Roy. Soc. London A161: 505. 1937. 

s9 Wrinch, D. M. ‘Trans. Faraday. Soc. 33: 1368. 1937. 

40 Langmuir, I., & Wrinch,D.M. Proc. Phys. Soc. 51: 613. 1939. 
«McFarlane, A. 8., & Kekwick, R.A. Biochem. Jour. 32: 1607. 1938. 
«Bernal, J.D. Proc. Roy. Soc. London A170: 75. 1939. 

43 Wrinch, D. M., & Langmuir, I. Jour. Am. Chem. Soc. 60: 2247. 1938. 
44 Kunitz, M., & Northrop, J. H. Jour. Gen. Physiol. 18: 433. 1935. 

4s Perutz, M.F, Ph.D. Thesis, University of Cambridge. 1939. 

46 Crowfoot, D. & Riley, D. Unpublished data. 1940. 


FANKUCHEN: THE STRUCTURE OF PROTEIN MOLECULES 167 


structure of dry insulin. The translation of the intensities into a set 
of F2 values is a direct and simple process. Using these, one could 
compute a three-dimensional Patterson summation, although the task 
would have been very tedious. Instead, Crowfoot used the data to 
compute one two-dimensional Patterson projection and four Harker 
two-dimensional sections. These were computed, referred to an hex- 
agonal unit cell, a = 74.8 A, c= 30.9 A, which contains three identical 
molecules at the points (0, 0,0) (1/3, 2/3, 1/3) (2/3, 1/8, 2/3). Due 
to the rhombohedral symmetry of the crystal these five diagrams are 
sufficient to map the significant details of the three-dimensional Pat- 
terson structure. The horizontal Harker sections are 5A apart, suf- 
ficiently close together when used in conjunction with the vertical 
sections to assure that no important peak will be missed. Crowfoot 
gave in a table (see TABLE 2) the approximate locations of four rep- 
resentative peaks which, when operated upon by the symmetry ele- 
ments present reproduced the entire structure of Patterson peaks. 
She was unable to suggest a structure which could explain this peak 
system. Attempts have been made to explain the Patterson dia- 
grams*” 48 but due to the rather meagre data upon which any such 
suggestions must at present be based, they should be considered as 
tentative. 
TABLE 2 


APPROXIMATE CO-ORDINATES OF REPRESENTATIVE PEAKS IN THE PaTTERSON 
DracramM For Dry Insvtin (Crowroor") 


x y 2 
A 147 .027 12 
127 113 — .25 
B 313 .047 —.17 
.30 047 33 
Cc .29 19 0 
D 0 0 5 


Recently, a preliminary note has been published by Crowfoot and 
Riley!* on wet insulin. The internal regularity of these crystals is 
apparently very good, as the diffraction patterns are very much 
better than for dry insulin, spacings as low as 2.4A being observed. 
It seems likely that a comparison of the wet and dry insulin projections 
and sections will permit the identification of the peaks with the re- 
spective origins they go with. Such a sorting out of the peaks would 
be a big step forward towards finding the phases of the F terms. When 
these are all known, we will know what the X-rays can tell us about 
the structure of insulin. 


168 ANNALS NEW YORK ACADEMY OF SCIENCES 


To sum up: X-ray studies permit the determination of molecular 
weights which are in good agreement with those obtained by other 
methods. An abundance of X-ray data can be obtained to test pro-— 
posed theories. To be sure, the test at present is a negative one. No 
theory can be accepted which does not agree with the X-ray evidence, 
but at present the converse of this statement is, unfortunately, not 
true. Progress is, however, being made and no doubt X-ray studies 
will continue to make substantial contributions to the steadily in- 
creasing store of knowledge that we have about proteins. 

Insulin is the only protein crystal on which any intensive X-ray — 
studies have been published. It would appear desirable that other 
protein crystals be more extensively studied in order to avoid the 
danger of using conclusions drawn from studies of a very special hor- 
mone, insulin, to symbolize the structure of proteins in general. It is 
known that hemoglobin* and lactoglobulin*® are being further in- 
vestigated. The writer hopes to make an X-ray survey of those pro- 
tein crystals which have not been studied, in order to acquire enough 
data to permit of the formulation of a classification based upon X-ray 
measurements, and also in order to determine those proteins which 
are likely to repay more intensive study.* 

* During 1939-1940, preliminary X-ray studies have been made by the writer (as Na- 
tional Research Fellow in Protein Chemistry) on horse serum albumin, ribonuclease 


(from Dr. M. Kunitz) and urease, as part of this suggested X-ray survey. The results 
will be published shortly, it is hoped. 


PUBLICATIONS 
OF THE 


NEW YORK ACADEMY OF SCIENCES 


(Lycuum or Narurau History, 1817-1876) 


(1) The Annals (octavo series), established in 1823, contain the scientific con- 
tributions and reports of researches, together with the records of meetings of the 
Academy. The articles which comprise each volume are printed separately, each 
in its own cover, and are distributed immediately upon publication. The price of 
the separate articles depends upon their length and the number of illustrations, and 
may be ascertained upon application to the Executive Secretary of the Academy. 


(2) The Transactions, reestablished as Series II in 1938, contain extended 
abstracts of papers presented before the regular sectional meetings of the Academy 
and other matters of general interest to Members, and are published monthly from 
November to June, inclusive. 


Current numbers of the Annals and Transactions are sent free to all Members 
of the Academy. The Annals are sent to Honorary and Corresponding Members 
desiring them. 


(3) The Special Publications, established in 1939, are issued at irregular intervals 
as cloth bound volumes. The price of each volume will be advertised at time 
of issue. 


(4) The Memoirs (quarto series), established in 1895, are issued at irregular 
intervals. It is intended that each volume shall be devoted to monographs relating 
to some particular department of Science. Volume I, Part 1, is devoted to 
Astronomical Memoirs, Volume II to Zodlogical Memoirs. No more parts of the 
Memoirs have been published to date. The price is one dollar per part. 


(5) The Scientific Survey of Porto Rico and the Virgin Islands (octavo series), 
established in 1919, gives the detailed reports of the anthropological, botanical, 
geological, paleontological, and zodlogical surveys of these islands. 


Subscriptions and inquiries concerning current and back numbers of any of the 
publications of the Academy should be addressed to 


Executive SECRETARY 


The New York Academy of Sciences 
care of 


The American Museum of Natural History, 
New York, N. Y. 


INTELLIGENCER PRINTING CO., LANCASTER, PA. 


we 
= 
?. 
oe 
BON 
= 
¥ 
i a 
he 4] 
2 
<< 
a=" 


